PERL - Lesson 2 : Variables, Subroutines and Objects

Variables
"In computer programming, a variable is a symbolic name given to some known or unknown quantity or value, for the purpose of allowing the name to be used independently of the value it represents ... A variable has three essential attributes: a symbolic name (also known as an identifier), a data location (generally in storage or memory, comprised of address and length), and the value, represented by the data contents of that location. These attributes are often assigned at separate times during the program execution ... Variables often also have a fourth attribute, a type or class which specifies the kind of information the variable stores." -- WikiPedia

Variables are the essential components of any computer programming language. They provide symbolic name to access the data in computer memory (hard disk and/or RAM). Different programming languages have different ways to initialize and manipulate variables. In this post I will focus on Perl programming language and through a simple script explain different types of variables available and the common actions performed on them. I have embedded the help as comment within this script.

I will also explain few other programming concepts like conditions and loops in Perl through this script. So stay alert to learn them.

1. Scalar Variables

#!/usr/bin/perl 

use strict;
use warnings;

# Scalar
# ======

# Scalar variables are stored & allocated in a single computer memory address location
#     * begins with '$' sign
#     * strings, numbers, references are typical scalar types
#     * notice no int or string or float
#     * single or double quotes for strings or characters
#     * no character or string distinguistion

# Assignment
#
my $value_num = 1;
my $value_str = "1";

# Printing 
print "Value number : $value_num\n"; # note the variable are declared within the double quotes with other string (unlike java or c)
print "Value String : $value_str\n";

# Double quotes
print "Value String : $value_str\n";

# Single quotes
print 'Value String : $value_str\n'; # in single quotes the variables are not recognized

# qq can be uses as double quote delimiter
# q can be used as single quote delimiter

print qq{Value String : $value_str\n"}; # similar to double quote
print q{Value String : $value_str\n"}; # similar to single quote

# Joining

my $first_name = "vivek";
my $last_name  = "gopalan";

my $full_name1 = "$first_name $last_name"; 
my $full_name2 = $first_name. " " . $last_name;  # Dot is used for concatinating scalars

print "Full name1 : $full_name1\n";
print "Full name2 : $full_name2\n";

my $value_sum = $value_num + 1; # mathematical operator - addition, substraction, division.

my $empty_num = undef; # initialize as null variable
undef $empty_num; # remove variable from memory - undeclare

# string manupulation can be done using methods - substr(), char(), length()

2. Array Variables

# Array 
# ------
# Collection of Scalar variables in order - refers to multiple memory locations 
#  * begins with @
#  * zero based
#  * not required to defined the initial size
#  * elements can be any type of scalar
#  * each element should be separated by comma
# Assignment
my @colors = ('red', 'blue', 'yellow');  # notice the circular bracket '()' 
my @mixed_scalars = ('red', 'blue', 'yellow',1);  # does not matter if the content is string or number or reference

my @empty_array =();

# Printing
print "Colors : @colors\n"; # print as space delimited string
print @colors;              # print without space delimit when not within double quotes..

# know the size of array, add element, delete element, update element, get sub elements, concatinate to scalar, search element 

# get element - zero-based
my $color = $colors[0]; # notice the square bracket '[]' to retrieve the elements from the array
my @subset = @colors[0..1]; # funky code

# updating element
$colors[1]= 'gray';

# number of elements in the array ( scalar-context of the array is the size of the array)
my $color_array_size = @colors; 
my $color_array_size_better_way = scalar @colors; 
my $color_array_max_index_size = $#colors;  # array size - 1


# manipulation

my @new_colors = @colors;
push @new_colors, 'green'; # adds to the end of array
unshift @colors, 'green';  # adds to the beginning of array - reverse of push

my $last_color  = pop @new_colors; # removes the last element of the array
my $first_color = shift @new_colors; # removes the first element of the array - reverse of pop

# Foreach loop
foreach my $color_tmp(@colors) {
   print  "Color [foreach] : $color_tmp\n";
}

# Forloop
for (my $index=0 ; $index <= $#colors ; $index++) {
   print  "Color [for] : $colors[$index]\n";
}

# time to introduce $_ variable
# the default variable when you do not declare any scalar value in a loop
foreach (@colors) {
   print  " Color [for] : ";
   print ;
   #print $_;
   print  "\n";
}

# search array elements


3. Hash Variables

Hash variables are similar to arrays, where instead of using integers as indices any scalar variable can be used as indices to store values. Hash variables are also called dictionary variables. The indices in the hash variables are referred as keys and for each unique key a scalar value can be assigned. The main advantage of hash variable is that the value can be directly accessed from the memory when the key is known, whereas to find a specific value in an array one needs to iterate through all the values of it.

my %personal_info = (
        "first_name" => "vivek",
        "last_name"  => "gopalan"
);  # notice the 'curly' brackets
my $fname1 = $personal_info{first_name}; # notice the 'curly' brackets

# the 'first_name' and 'last_name' are the keys and the "vivek" and "gopalan" are the values. The key and values can be any scalar value in Perl.
# the '=>' is used an assignment operator to map key and value. Each key value pair should be separated by ',' (comma)
#   * the hash variables starts with '%'
#   * circular brackets to store the keys and values (similar to arrays)
#   * separated by comma
#   * unique key is associated with value

# other representation
# my %personal_info = ("first_name","vivek","last_name","gopalan"); # similar to array and alternate values are considered as key and values.


Summary

# A. Scalar variable
my $first_name = "vivek"; # notice the dollar sign
my $last_name  = "gopalan";

# B. Array variable

my @personal_info = ( "vivek", $last_name); # notice the circular bracket and @ symbol
my $fname = $personal_info[0]; # notice the 'square' bracket and the integer index

# C. Hash variable
my %personal_info = (
        first_name => "vivek",
        last_name  => "gopalan"
);  # notice the 'circular' brackets and % symbol
my $fname1 = $personal_info{first_name}; # notice the 'curly' brackets


Simple Puzzle

# Identify the mistakes in the following variable declarations ?

# A
my @personal_info = ["vivek", "gopalan"];
# B
my %personal_info = {"first_name","vivek"};
# C
my %status = {"company" => "ABC company", "degrees" => ("B.Tech", "PhD") };

# D
my @details = ( ("vivek", "gopalan"), ("B.Tech", "PhD"));
my @name_details = $details[0];


Subroutines

Subroutines or functions are another essential components of programming language. The subroutine wraps a set of commands or scripts that can be repeatedly used or perform specific function.It keeps the code organized. Subroutines usually takes some input values and perform specific function and then return output values. It is not required for a subroutine to provide a input or generate a output.

# subroutine to calculate sum of first 'n' integers.
# 'n' is the input value and the sum is the output value
sub get_sum {
   my ($n) = @_; # the first value of the @_ array is assigned to $n
   my $sum = 0;
   for (my $i =1 ; $i<=n ; $i++) {
      $sum += $i
   }
   return $sum;
}

# Calling the function
my $numb = 10;
my $num_total = &get_sum($numb); # notice the 'ampersand' symbol to call the method.

# Notice that the arguments are not passed in the function definition statement (unlike Java, C programming languages).
# All the input arguments are passed as a special type of array - @_.
# @_ is the default array variable.
# the each value of the @_ arrays has to be scalar value (the arrays and hashes can be passed as references -- defined in next section)
# 'return' command returns the value to be returned. return statement can return any type of variable.
# If the return statement is not used then the variable in the last statement is returned (not a good practice)

# Instead of 'my ($n) = @_' the following line can be used..
my $n = shift; # the shift or pop or grep or other array-related commands without any argument will use the @_ as input.

# $_[0], $_[1] .. are used to extract specific element from the @_ array.
# the $n variable declared within the get_sum function is 'local' (can be only accessed within the function)

Time to introduce references

As explained above variables are physically stored in some memory address in the RAM or Hard Disk. References are nothing but the memory address associated with different variable types (scalar or array or hash). It is important to remember that reference of any type of variable is always scalar. i.e the memory address is always just the location value associated with the variable For arrays and hashes the references, for understanding, you can assume that the reference value is just the unique memory identifier associated with variable.
# scalar reference
my $first_name = "vivek";
my $first_name_ref = \$first_name; # notice the 'backslash'

my $first_name0 = $$first_name_ref; # dereferencing: extra 'dollar' sign

# Array reference
my @personal_info = ("vivek", "gopalan");
my $personal_info_array_ref = \@personal_info; # notice the 'backslash'
my @personal_info0 = @$personal_info_array_ref; # dereferencing: get the array back from reference

my $first_name1 = $personal_info_ref->[0]; # dereferencing:notice the '->' operator to extract the data

# Hash reference
my %personal_info = ("first_name","vivek");
my $personal_info_hash_ref = \%personal_info; # notice the backslash
my %personal_info0 = %$personal_info_hash_ref; # dereferencing: get the hash back from reference

my $first_name2= $personal_info_hash_ref->{"first_name"}; # dereferencing: notice the '->' operator


Anonymous arrays and hashes

In the previous section we saw how to define references for a given array and hash variable. It is a two step process where first a array or hash variable was defined and it is converted to reference scalar variable by adding 'backslash' before the array or hash variable. Now I will show how array and hash references can be created without defining the variables first. It is an very important concept to understand for handling complex data models as multidimensional arrays and even object oriented Perl. Once you comprehend this concept then you can call yourself as Perl "Code Breaker".
# Anonymous Array reference 
my $personal_info_array_ref = ["vivek", "gopalan"]; # just use square brackets instead of 'circular' brackets and assign the value to a scalar($) instead of array (@).

# This is exactly the same as defining an array and then referencing it to a scalar variable
my @personal_info = ("vivek", "gopalan");
my $personal_info_array_ref = \@personal_info; # notice the 'backslash'


my $first_name1 = $personal_info_ref->[0]; # dereferencing:notice the '->' operator to extract the data

# Anonymous Hash reference 

my $personal_info_hash_ref = { "first_name" => "vivek" }; # notice the curly brackets and assignment to the scalar($) value.

# This is exactly similar to defining an hash and then obtaining the reference of ith through backslash
my %personal_info = ("first_name" => "vivek");
my $personal_info_hash_ref = \%personal_info; # notice the backslash

Power of Anonymous References

One more touch to understand the power of anonymous references. This time by example, so that you will find it very easy to learn. I will take two-dimensional (table) data as example. This could be a typical database result in real application.
# Important rule to reiterate for arrays and hashes

# The value of each element in the array or hash must be a scalar

# my %info = {"degrees" => ("B.Tech","PhD") }; # The statement is wrong because we cannot assign array as value in an hash
# solution - anonymous array solves this problem
my %info = {"degrees" => ["B.Tech","PhD"] }; # now the value is an anonymous array rather than the regular array.. very easy huh..

# dereferencing:

# the three different ways to get the "B.Tech" degree out of the %info variable

my $degrees_ref = $info{degrees}; # get the array reference value associated with the key
my $first_degree1 = $degrees_ref->[0]; # 1 - get the specific array value from the reference

my @degrees_array = @$degrees_ref; # Dereference the reference value
my $first_degree2 = @degrees_array[0]; # 2

my $first_degree3 = $info{"degrees"}->[0]; # 3 -- the one liner

# How will you know what is the type of specific reference? Important when you
# handle someone else data.

# New concept: ref() method can be used to know the type of a scalar value.
# ref($degrees_ref) gives ARRAY
# the possible values are undef (if not an reference), ARRAY, HASH, SCALAR
# ref($first_degree3) gives undef


# Coming back to the 2 dimensional data.
# lets first try to store a simple 2 x 2 matrix as perl variable. Let's take the following example.
#   00 01
#   10 11
my @matrix = (("00","01"),("10","11"));
print "Number of elements in the matrix array : " . scalar @matrix . "\n";
# the @matrix variable is just an array with 4 elements. i.e. $matrix[2] is "10" and hence it just an array.

# Anonymous reference or reference can help to solve this problem

my @first_row = ("00","01");
my @second_row = ("10","11");

my @matrix = (\@first_row, \@second_row);
# to get "10" ( first row and second column value)

my $value10 = $matrix[0]->[1]; # $matrix[0] gives the array reference of @first_row and the to get the second value from the array reference we have used '->' operator.

# similarly to get the "11" element (second row, second column entry) from the matrix
my $value11 = $matrix[1]->[1];

# simplifying things using Anonymous references
my @matrix = (["00","01"], ["10","11"]); # notice the circular brackets.
my $val1 = $matrix[1]->[1]; # to get "11" value

my $matrix_ref = (["00","01"], ["10","11"] ] ; # notice the square brackets
my $val1 = $matrix_ref->[1]->[1]; # to get "11" value , notice the double '->' usage


# to print out the results as 2D table from the $matrix_ref variable

my $nrows = scalar @$matrix_ref; # the size of the dereferenced array is the number of rows

for (my $row = 0; $row < $nrows ; $row ++) {
   my $ncols = scalar @{$matrix_ref->[$row]}; # Dereferencing each value of row data, which itself is a an array reference.
   for (my $col = 0; $col < $ncols ; $col ++) {
      print $matrix_ref->[$row]->[$col], "\t";
   }
   print "\n";
}

# this logic can be used to store any dimension matrices in Perl.

Perl data model in practice

Let me use my car data as an example to explain this concept

my $car_data_ref = {
   "model" => "Hyundai Accent",
   "make"  => "2007",
   "color"  => "Platinum",
   "type"  => "sub-compact car",
   "amount_spend_on_services" => {
      2008 => 200,
      2009 => 500, 
      2010  => 300
   },
   "owners" => ["Vivek","Dhivya"],

};

# As you have noticed we stored all the details about the car in a complex hash reference variable.
# You need to know how to perform CRUD (Create, Read, Update and Delete) actions or operations on the contents of the data. Let me show how this can be done.

# Create: Add a new attribute called 'gear_type' and assign 'automatic' as scalar value.
$car_data_ref->{'gear_type'} = "automatic"; # set the new attribute for the hash reference

# Read : Find total amount of money spend for servicing

my $total = 0;
my $amounts_hashref = $car_data_ref->{'amount_spend_on_services'}

foreach my $year (keys %$amounts_hashref) {
    $total += $amounts_hashref->{$year};
}
print "Total amount spend on service : ", $total, " USD\n";

# Update : add 'Sanjana' as owner along with Vivek and Dhivya
push @{$car_data_ref->{'owners'}}, "Sanjana"; # dereference the array and then push the new scalar value to it.

# Delete : remove 'type' attribute for the car data
$car_data_ref->{'type'} = undef; # delete command also can be used.

One of the critical component of any software development is the data model. One has to spend enough time to design a good data model.

Object-Oriented Perl: Blessed references

In the previous section, we learned about references and anonymous references. Using matrix and Car data as example, I then explained how complex data models can be created and manipulated. Now I will go through some of the problems with the complex data models created using references and explain how these problems could be solved using blessed references or objects. The $car_data_ref hash reference that we created in the previous example can be used to store attributes of a car. But, it will be very useful if we could associate subroutines to modify or extract details in it. For example, Methods such as get_total_amount_spent() or set_type() or get_owners() that are associated with the contents of the the $car_data_ref would be very valuable when you want to share your datamodel with others or providing meaningful access to the datamodel (needs rewrite - vivek). The data along with its properties and all its behaviors (subroutines) is referred to as object in the Object Oriented Programming world.
"In the domain of object-oriented programming an object is usually taken to mean a compilation of attributes (object elements) and behaviors (methods or subroutines) encapsulating an entity. In this way, whilst primitive or simple data types are still just single pieces of information, object oriented objects are complicated types that have multiple pieces of information and specific properties (or attributes). Instead of merely being assigned a value, (like int =10), objects have to be "constructed". In the real world, if a gun (let's say a Colt 45) is an "object", its physical properties and its function to shoot would have been individually specified. Once the properties of this Colt 45 "object" had been specified into the form of a class (let's call it 'gun'), it can be endlessly copied to create identical objects that look and function in just the same way. As an alternative example, animal is a superclass of primate and primate is a a superclass of human. Individuals such as Joe Bloggs or John Doe would be particular examples or 'objects' of the human class, and consequently possess all the characteristics of the human class (and of the primate and animal superclasses as well." -- WikiPedia
In Perl, the data attributes of the object is usually represented as an anonymous hash reference. 'bless' command is used then to link the anonymous hash reference with a specific Class type. I will explain this concept by extending the example that I explained in the previous section.
package Car;

sub new {
    my ($class) = @_;
    my $data = {
      "model" => "Hyundai Accent",
      "make"  => "2007",
      "color"  => "Platinum",
      "type"  => "sub-compact car",
      "amount_spend_on_services" => {
         2008 => 200,
         2009 => 500, 
         2010  => 300
      },
      "owners" => ["Vivek","Dhivya"],
   };

   bless $data, $class;
   return $data;
}
sub get_total_amount_spent {
   my ($self) = @_; # note the first argument is the object data..
   my $total = 0;
   my $amounts_hashref = $self->{'amount_spend_on_services'}

   foreach my $year (keys %$amounts_hashref) {
       $total += $amounts_hashref->{$year};
   }
   return $total;
}

sub set_type {
   my ($self, $type); # the first argument is the object data and the second argument is the value passed
   $self->{'type'} = $type;
}

# end of Car package
# beginning of the the default package..
package main;

my $car_obj = new Car();

$car_obj->set_type('Compact Car'); # Call the subroutine
my $total_amt_spent = $car_obj->get_total_amount_spent();

print "Total amount spent : $total_amt_spent USD \n";

my $type = $car_obj->{'type'}; ## Note: this works, but in Object Oriented Programming (OOPs) good practice, the attributes should be hidden and should be exposed through subroutines.

Need to explain 'package', bless and subroutine statements - vivek.