Tightening those bolts: How to use np.where()function in python.

Tade Ogundele
2 min readJan 14, 2020

--

Functions are numerous in libraries and so much it must be tedious to keep count, but they are what make/mar your algorithm, they are important to your code as bolts and nuts are to engines; small but important which is why i wrote this article to explain in steps using an example of an array and dataframe to explain how to use np.where() function that is from the numpy library in python.

It was a bit confusing to me myself but after some research and personal studying time, I finally got it and I’d like to share it with you.There are 2 major uses of np. where() function that I will discuss here.

A. Getting the index of values that satisfy a certain condition

B. Replacing values with another value at the index that satisfies the condition that green-lights the replacement

So let’s see how the first function works.

we create an array after importing the necessary library, apply the where function so we can get the positions of the values that satisfy the condition given. check out the codes and try and see the result ‘position’ will give to fully grasp this

<script src=”https://gist.github.com/oluwatobij4/bd5e59da70b9132a02c0b77e76eddb0e.js"></script>

So for the second function we import our .csv file and turn it into dataframe with pandas now we have our data frame, examining the columns, let’s say we would like to change some things in a certain column for a condition.
let’s inspect the Age column in this data frame and we would like to group the age into two, say the victims in their 20s and younger and 30s and older. So we group it as ‘twenties downwards’ and ‘thirties upwards.’
all we have to do is use np.where() function.

df[‘age’] = np.where(df[‘age’]< 20, ‘Twenties downwards’, ‘Thirties upwards’)

df[‘age’] the first term on the left-hand-side implies the column involved from the data set that needs to be modified to suit our preferences.on the right-hand-side.
we have the first argument, df[‘age’]<20, which is the condition to be met.The second argument, ‘Twenties downwards’ refers to the replacement for any index where the condition holds true and the third argument, ‘Thirties upwards’ is the else option, i.e when the condition given doesn't hold.
so running this code we see how the age column has been replaced with the terms we introduced.

Try it out on your own and let me know what you come up with. There are a lot of .csv files on the internet for you try this on.
Thank you!

--

--