Basic operations on python sequences

Here we talk about some basic operations on python sequences. Let's create some sequences.

In [1]:
strFruits='Apples, Oranges, Bananas, Coconuts, Lychee, Watermelon, Mangoes'
In [2]:
tplFruits=('Apples','Oranges','Bananas','Coconuts','Lychee','Watermelon','Mangoes')
In [3]:
lstFruits=['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes']

A sequence has a length. We can find out the length of a sequence using len()

In [4]:
print 'Lengths are -- string:',len(strFruits), 'tuple:', len(tplFruits), 'list:', len(lstFruits)
Lengths are -- string: 63 tuple: 7 list: 7

String operations

First, we will talk about some string operations.

String constants

The string module defines a number of constant strings which can come in handy.

In [5]:
import string
print 'Digits',string.digits
print 'ASCII Letters',string.ascii_letters
print 'ASCII Lowecase',string.ascii_lowercase
print 'ASCII Uppercase',string.ascii_uppercase
print 'Punctuation',string.punctuation
print 'Whitespace',[string.whitespace]
Digits 0123456789
ASCII Letters abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
ASCII Lowecase abcdefghijklmnopqrstuvwxyz
ASCII Uppercase ABCDEFGHIJKLMNOPQRSTUVWXYZ
Punctuation !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Whitespace ['\t\n\x0b\x0c\r ']

String Funtions

You can call string functions as string.function(...). For example, in order to count how many times a substring appears in a string the following function can be used:

In [6]:
string.count(strFruits,',')
Out[6]:
6

Another way of calling the same function is:

In [7]:
strFruits.count(',')
Out[7]:
6

The 'find' function can be used to search for a character or substring within a string. If the substring occurs within the string, the starting index is returned. If not found, then -1 is returned.

In [8]:
strFruits.find('Man')
Out[8]:
56
In [9]:
print strFruits[strFruits.find('Ban'):]
Bananas, Coconuts, Lychee, Watermelon, Mangoes

In [10]:
strFruits.find('G')
Out[10]:
-1

The 'index' function is similar to the 'find' function except that it throws a 'ValueError' exception if the substring is not found.

In [11]:
strFruits.index('M')
Out[11]:
56
In [12]:
strFruits.index('G')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-22e1b0e51e1f> in <module>()
----> 1 strFruits.index('G')

ValueError: substring not found

The simplest way to check if a substring occurs in a string is to use 'in':

In [13]:
'A' in strFruits
Out[13]:
True
In [14]:
'G' in strFruits
Out[14]:
False

A number of functions are available to check if the string is of a certain type.

Check if a string is alphanumeric:

In [15]:
'123AB'.isalnum()
Out[15]:
True
In [16]:
'123AB-'.isalnum()
Out[16]:
False

Check if a string is digit:

In [17]:
'123'.isdigit()
Out[17]:
True

Check if a string is lowercase:

In [18]:
'hello'.islower()
Out[18]:
True

'lower' and 'upper' functions can be used to convert a string to lower and upper case, respectively.

In [19]:
'HELLO'.lower().islower()
Out[19]:
True

'lstrip' and 'rstrip' remove leading and trailing whitespaces from a string. 'strip' does both.

In [20]:
'  hello  '.strip()
Out[20]:
'hello'

'+' can be used to concatenate strings.

In [21]:
'hello'+' world'
Out[21]:
'hello world'

'a.join(b)' inserts 'a' after every element of 'b'.

In [22]:
' '.join(strFruits)
Out[22]:
'A p p l e s ,   O r a n g e s ,   B a n a n a s ,   C o c o n u t s ,   L y c h e e ,   W a t e r m e l o n ,   M a n g o e s'

The 'split' function breaks up a string into a list based on a specific substring.

In [23]:
strFruits.split(',')
Out[23]:
['Apples',
 ' Oranges',
 ' Bananas',
 ' Coconuts',
 ' Lychee',
 ' Watermelon',
 ' Mangoes']

'str' converts other sequences into string.

In [24]:
str(lstFruits)
Out[24]:
"['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes']"

Why is the length of len(str(lstFruits)) different?

In [25]:
len(str(lstFruits))
Out[25]:
79

'join' can also concatenate list elements. See the examples below.

In [26]:
''.join(lstFruits)
Out[26]:
'ApplesOrangesBananasCoconutsLycheeWatermelonMangoes'
In [27]:
', '.join(lstFruits)
Out[27]:
'Apples, Oranges, Bananas, Coconuts, Lychee, Watermelon, Mangoes'

'==' can be used to check if two strings are same.

In [28]:
', '.join(lstFruits)==strFruits
Out[28]:
True

'replace' replaces all occurrences of a substring with another.

In [29]:
strFruits.replace('a','A')
Out[29]:
'Apples, OrAnges, BAnAnAs, Coconuts, Lychee, WAtermelon, MAngoes'

Exercise

Remove all punctuation and numbers from the following string and convert the whole string to lower case:

In [30]:
strCaesar="In cryptography, a Caesar cipher, also known as Caesar\'s cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of three, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence.In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of three, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar (100BC-44BC), who used it in his private correspondence."
In [31]:
print strCaesar
In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of three, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence.In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of three, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar (100BC-44BC), who used it in his private correspondence.

The solution is below:

In [32]:
strClean=strCaesar
for c in string.punctuation+string.whitespace+string.digits:
    strClean=strClean.replace(c,'')
strClean=strClean.lower()
print strClean
print strClean.isalpha()and strClean.islower()
incryptographyacaesarcipheralsoknownascaesarsciphertheshiftciphercaesarscodeorcaesarshiftisoneofthesimplestandmostwidelyknownencryptiontechniquesitisatypeofsubstitutioncipherinwhicheachletterintheplaintextisreplacedbyalettersomefixednumberofpositionsdownthealphabetforexamplewithaleftshiftofthreedwouldbereplacedbyaewouldbecomebandsoonthemethodisnamedafterjuliuscaesarwhouseditinhisprivatecorrespondenceincryptographyacaesarcipheralsoknownascaesarsciphertheshiftciphercaesarscodeorcaesarshiftisoneofthesimplestandmostwidelyknownencryptiontechniquesitisatypeofsubstitutioncipherinwhicheachletterintheplaintextisreplacedbyalettersomefixednumberofpositionsdownthealphabetforexamplewithaleftshiftofthreedwouldbereplacedbyaewouldbecomebandsoonthemethodisnamedafterjuliuscaesarbcbcwhouseditinhisprivatecorrespondence
True

Operations on Tuples

In [33]:
'+' concatenates tuples.
  File "<ipython-input-33-78f5acd3c047>", line 1
    '+' concatenates tuples.
                   ^
SyntaxError: invalid syntax
In [34]:
print tplFruits+('Guava',)
('Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Guava')

'*' and pow will have the same effect as they do on strings.

In [35]:
print 2*tplFruits
('Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes')

Exercise: Tuples in Swapping

Tuples can be used in swapping as shown below.

In [36]:
a=1
b=2
c=3
d=4
In [37]:
a,b,c,d=d,c,a,b
In [38]:
print a,b,c,d
4 3 1 2

Exercise: Tuples in function returns

Multiple returns from a function are, in reality, implemented as tuples.

In [39]:
def func(x,y):    
    return x+y,y*x
In [40]:
v = func(4,2)
print type(v)
a,b=v
print a,b
<type 'tuple'>
6 8

Operations on lists

'+' concates lists too. Below, we concate one list with another list that contains a tuple.

In [41]:
tmp=lstFruits+[tplFruits]
print 'Length of the list tmp is',len(tmp),'and the contents are',tmp
print "The last element of tmp is:",tmp[-1]
Length of the list tmp is 8 and the contents are ['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', ('Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes')]
The last element of tmp is: ('Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes')

Multiple indexing is also supported:

In [42]:
print tmp[-1][2]
Bananas

We can also delete elements from a list as follows:

In [43]:
del tmp[-1]
print 'After deletion',tmp
After deletion ['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes']

Since we have removed the last element of tmp it should now be equal to lstFruits:

In [44]:
print tmp==lstFruits
True

'all' and 'any' can be used to check if all or any elements in a sequence are logically true.

In [45]:
print all([1,0])
False

In [46]:
print all((1,)),all('1')
True True

In [47]:
print any([1,0])
True

'append' can be used to append elements to a list.

In [48]:
lstFruits.append('Apples')
print lstFruits
['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples']

'count' can be used to count the number of occurrences of an element in a list.

In [49]:
print lstFruits.count('Apples')
2

'append' simply appends to the list. Here we append a list to lstFruits.

In [50]:
lstFruits.append(['Grapes','Strawberry'])
print lstFruits
['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples', ['Grapes', 'Strawberry']]

Deletion of the list's last element results in the removal of the list we added above. The point: a list can be an element of another list and so on.

In [51]:
del lstFruits[-1]
print lstFruits
['Apples', 'Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples']

We can also removed the first occurrence of an element.

In [52]:
lstFruits.remove('Apples')
In [53]:
print lstFruits
['Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples']

'extend' adds elements of a list to the list.

In [54]:
lstFruits.extend(['Grapes','Strawberry'])
print lstFruits
['Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples', 'Grapes', 'Strawberry']

'index' finds the index at which a certain element occurs in the list. 'in' can be used to test presence of an element within a list.

In [55]:
lstFruits.index('Grapes')
Out[55]:
7
In [56]:
lstFruits.index('Tomato')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-83c5224c2a0b> in <module>()
----> 1 lstFruits.index('Tomato')

ValueError: 'Tomato' is not in list
In [57]:
'Tomato' in lstFruits
Out[57]:
False

'pop' can be used to remove elements from a list.

In [58]:
v=lstFruits.pop()
print 'The popped element is:',v
print 'The list is:',lstFruits
v=lstFruits.pop(2) # pop the 3rd element
print 'After removal of 3rd element \''+v+'\', the list is:',lstFruits
The popped element is: Strawberry
The list is: ['Oranges', 'Bananas', 'Coconuts', 'Lychee', 'Watermelon', 'Mangoes', 'Apples', 'Grapes']
After removal of 3rd element 'Coconuts', the list is: ['Oranges', 'Bananas', 'Lychee', 'Watermelon', 'Mangoes', 'Apples', 'Grapes']

'reverse' reverses the list.

In [59]:
lstFruits.reverse()
print lstFruits
['Grapes', 'Apples', 'Mangoes', 'Watermelon', 'Lychee', 'Bananas', 'Oranges']

Exercise: Add an element after the second element in the list.

Solution:

In [60]:
lstFruits=lstFruits[:2]+['Oranges']+lstFruits[2:]
print lstFruits
['Grapes', 'Apples', 'Oranges', 'Mangoes', 'Watermelon', 'Lychee', 'Bananas', 'Oranges']

'min','max' and 'sum' do what they are named for.

In [61]:
lstRange=range(10)
print lstRange
print 'min:',min(lstRange),'max:', max(lstRange),'sum:',sum(lstRange)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
min: 0 max: 9 sum: 45

'sorted' and 'sort' both sort.

In [62]:
lstRange=range(10)+range(-3,5)
print lstRange
print sorted(lstRange)
print lstRange
lstRange.sort()
print lstRange
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -3, -2, -1, 0, 1, 2, 3, 4]
[-3, -2, -1, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -3, -2, -1, 0, 1, 2, 3, 4]
[-3, -2, -1, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]

In [63]:
'==' can be used to test for element-wise equality of two lists.
  File "<ipython-input-63-ef5445ede951>", line 1
    '==' can be used to test for element-wise equality of two lists.
           ^
SyntaxError: invalid syntax
In [64]:
a=1
b=2
[1,2]==[a,b]
Out[64]:
True

List Construction and Comprehension

There are different ways of constructing a list:

In [66]:
lstNew=[]
for i in range(10):
    lstNew.append(i**2)
print lstNew
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [1]:
lstNew=[i**2 for i in range(10)]
print lstNew
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [68]:
print [i**2 for i in range(10) if 2*i>=i**2]
[0, 1, 4]

In [69]:
print [ i**2 if 2*i>=i**2 else i for i in  range(10) ]
[0, 1, 4, 3, 4, 5, 6, 7, 8, 9]

Exercise

What does the following code do?

[i*j for i in range(10) for j in range(10) if i%2 and not j%2]

By reference and by value, shallow and deep copying

Let's make a list of lists and change one element in this list of lists.

In [70]:
a=[1,2,3]
b=[4]
print 'a =',a
lol=[a,a]+b #equivalently lol=a*2+b
print 'lol =',lol
lol[1][1]=100
print 'lol =',lol
print 'a =',a
a = [1, 2, 3]
lol = [[1, 2, 3], [1, 2, 3], 4]
lol = [[1, 100, 3], [1, 100, 3], 4]
a = [1, 100, 3]

Nore that chaing an element in 'lol' changes the corresponding element in 'a'. This is because of Python's philosophy of making no unecessary copies and using references rather than deep copying.

Let' see another example to illustrate the same issue.

In [71]:
def foo(a):
    a[-1]='last'
    b=['this','is','added']
    return a+b

a=[1,2,3]
print foo(a)
print a
[1, 2, 'last', 'this', 'is', 'added']
[1, 2, 'last']

Exercise: Change this behavior to prevent modifications to the "outer" list 'a'.

Another example of shallow copying:

In [72]:
list1 = [1, 'element']
list2 = list1
print 'Are the ids the same: ', id(list1)==id(list2)
list2[0] = 2 # Modifies the original list
print 'list1[0]=',list1[0] # Displays 1
print 'list2[0]=',list2[0] # Displays 1
Are the ids the same:  True
list1[0]= 2
list2[0]= 2

[:] can be used to make a 'deep' copy.

In [73]:
list1= [1, 'element']
list2 = list1[:] # Copy using "[:]"
print 'Are the ids the same: ', id(list1)==id(list2)
list2[0] = 2 #Making a change
print 'list1[0]=',list1[0] # Displays 1
print 'list2[0]=',list2[0] # Displays 1
Are the ids the same:  False
list1[0]= 1
list2[0]= 2

Another way of making a deep copy is to use the 'copy' module.

In [74]:
import copy
list1 = [1, [2, 3]] # Notice the second item being a nested list
list2 = copy.deepcopy(list1) # A deep copy
print 'Are the ids the same: ', id(list1)==id(list2)
list2[1][0] = 4 # Leaves the 2nd item of list1 unmodified
print 'list1[1][0]=',list1[1][0] # Displays 2
print 'list2[1][0]=',list2[1][0] # Displays 4
Are the ids the same:  False
list1[1][0]= 2
list2[1][0]= 4

Important conceptual note: What exactly are python variables?

Instead of focusing just on the mechanics, let's dig a little deeper this time. The concept manifesting itself as the behavior you have seen earlier is:

In Python, variables are references to objects.

This generalizes to all variables in Python and not just sequences. Python variables merely refer to python objects and they are not, themselves, the objects they refer to (because they are just references!).

A python object exists in memory and a number of variables can refer to it. Python objects are typed but variables are untyped. A shallow copy, through assignment (x=y), makes a copy of the reference (y) in (x). As a consequence, both the variables now refer to the same object. A deep copy, on the other hand, makes a copy of the object and makes a variable to refer to this new copy.

Now see the above code in the light of this new concept: variables list1 and list2 after the assignment list2=list1 in our example of shallow copying refer to the same object. The identity of that object can be obtained by using id(obj) on the variables. The id's are the same in case of shallow copying but not in the case of deep copying which indicates that the result of deep copying is a new object.

Apart from allowing the behavior above, the concept of variables being references has a number of implications:

Though exercise: Why doesn't x change outside the function in the code below?

In [75]:
def fcn(x):
    y=[4,5,6]
    print 'y before change:',y
    x=y    
    x[0]=10
    print 'y before change through x:',y
    
x=[1,2,3]
print 'x before function call:',x
fcn(x)
print 'x after function call:',x
x before function call: [1, 2, 3]
y before change: [4, 5, 6]
y before change through x: [10, 5, 6]
x after function call: [1, 2, 3]

Solution

The function call fcn(x) makes a copy of the reference x being specified as input (assignment). What is changed is a copy of the reference which now points to a new object whereas the original reference (x outside the function) remains the same. The use of assignment in Python function calls is underlined in the alternative way of calling a function in Python: fcn(x=x).

More things to do with sequences

Let's get back to focus on sequences. For loops operate on sequences.

In [76]:
A=range(2,10,2)
for a in A:
    print a
2
4
6
8

'enumerate' is an interesting function that enumerates the elements in a sequence. This can be very handy in a large number of sitations.

In [77]:
for a in enumerate(A):
    print a
(0, 2)
(1, 4)
(2, 6)
(3, 8)

For example in selecting certain elements from a list to make a new list.

In [78]:
print [val for idx,val in enumerate(A) if idx in [0,1,3]]
    
[2, 4, 8]

'zip' zips multiple sequences to generate a list of tuples by selecting one element from each of the sequences.

In [79]:
zip('hello',['p','a','k','i','s','t','a','n'])
Out[79]:
[('h', 'p'), ('e', 'a'), ('l', 'k'), ('l', 'i'), ('o', 's')]

Another example:

In [81]:
lots=zip([1,2,3],[4,5,6]) #zip list of lists into a list of tuples
print lots
[(1, 4), (2, 5), (3, 6)]

We can 'unzip' by using '*':

In [82]:
zip(*lots)
Out[82]:
[(1, 2, 3), (4, 5, 6)]

(c) Dr. Fayyaz ul Amir Afsar Minhas, DCIS, PIEAS, Pakistan.