We will now delve into Python programming from which point onwards we can then learn PySpark API syntaxes better.
Below are main DataTypes in Python:
1. List -> mutable datatype to store dynamic type [1,2,3,4,5]
2. Set -> Returns the list of distinct items in the set {1,1,1,2,3,4,3} would return {1,2,3,4}
3. Dictionary -> Key Value paired. {'k1':'v1'}
4. Tuple -> Like a list but is immutable meaning we cant change any item in the Tuple list --> x = (1,2,3,4) would make it such that we cant x[0] = 5 it up.
Here, Dictionary and Set are unordered data types.
dictionary.Keys() and dictionary.Items() would return us the list of unordered keys and items in the dictionary respectively.
Strings defined and methods available on them :
s = " I am Shamus"
s.upper()
s.lower()
s.split('\t')
s.split('#')
Functions are defined like below and optional DOCSTRING for inferring the function purpose can be gotten between the ‘’’ markdown :
def func_name(par1, par2):
‘’’
## DocString definition
‘’’
return par1 + par2
res = func_name(1,2) ## Invocation of the function defined above.
NOTE: Indentation is important in Python as Indentation refers to inner content of the outer marker. Here, outer marker -> def func_name(par1, par2) and Inner Content -> return par1 + par2.
List and its methods are defined like below:
ls = [1,2,3,4,5]
ls.pop(0) ls.pop()
ls.append(3)
ls[3] = 4
ls[-1] would return last element from the list
List comprehension is a feature in Python 3.0+ wherein we can have a syntactic sugar like below:
new_list = [x**2 for x in numbers]
for creating a new list of original list numbers (say [1,2,3]) mapped into new_list -> [1,4,9]
Usages for Conditional statements and Loop operators is quite straight forward:
if <cond>:, elif <cond>: and else:
while <cond>:
for num in numbers
This is all that is required for understanding Programming wrt Spark DataFrame understanding. So, lets see you next in Spark DataFrame blog. Happy coding :)
ความคิดเห็น