Data to Fish

Data to Fish

5 ways to apply an IF condition in Pandas DataFrame

In this guide, you’ll see 5 different ways to apply an IF condition in Pandas DataFrame.

Specifically, you’ll see how to apply an IF condition for:

  • Set of numbers
  • Set of numbers and lambda
  • Strings and lambda
  • OR condition

Applying an IF condition in Pandas DataFrame

Let’s now review the following 5 cases:

(1) IF condition – Set of numbers

Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions:

  • If the number is equal or lower than 4, then assign the value of ‘ Yes ‘
  • Otherwise, if the number is greater than 4, then assign the value of ‘ No ‘

This is the general structure that you may use to create the IF condition:

For our example, the Python code would look like this:

Here is the result that you’ll get in Python:

(2) IF condition – set of numbers and  lambda

You’ll now see how to get the same results as in case 1 by using lambda, where the conditions are:

Here is the generic structure that you may apply in Python:

For our example:

This is the result that you’ll get, which matches with case 1:

(3) IF condition – strings

Now, let’s create a DataFrame that contains only strings/text with 4  names : Jon, Bill, Maria and Emma.

The conditions are:

  • If the name is equal to ‘Bill,’ then assign the value of ‘ Match ‘
  • Otherwise, if the name is not   ‘Bill,’ then assign the value of ‘ Mismatch ‘

Once you run the above Python code, you’ll see:

(4) IF condition – strings and lambda 

You’ll get the same results as in case 3 by using lambda:

And here is the output from Python:

(5) IF condition with OR

Now let’s apply these conditions:

  • If the name is ‘Bill’  or ‘Emma,’ then assign the value of ‘ Match ‘
  • Otherwise, if the name is neither ‘Bill’ nor ‘Emma,’ then assign the value of ‘ Mismatch ‘

Run the Python code, and you’ll get the following result:

Applying an IF condition under an existing DataFrame column

So far you have seen how to apply an IF condition by creating a new column.

Alternatively, you may store the results under an existing DataFrame column.

For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros :

‘set_of_numbers’: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0 , 0 ]

You may then apply the following IF conditions, and then store the results under the existing ‘ set_of_numbers ‘ column:

  • If the number is equal to 0 , then change the value to 999
  • If the number is equal to 5 , then change the value to 555

Here are the before and after results, where the ‘5’ became ‘555’ and the 0’s became ‘999’ under the existing ‘set_of_numbers’ column:

On another instance, you may have a DataFrame that contains NaN values . You can then apply an IF condition to replace those values with zeros , as in the example below:

Before you’ll see the NaN values, and after you’ll see the zero values:

You just saw how to apply an IF condition in Pandas DataFrame . There are indeed multiple ways to apply such a condition in Python. You can achieve the same results by using either lambda, or just by sticking with Pandas.

At the end, it boils down to working with the method that is best suited to your needs.

Finally, you may want to check the following external source for additional information about Pandas DataFrame .

Leave a Comment Cancel reply

I agree to comply with the Terms of Service and Privacy Policy when posting a comment.

Datagy logo

  • Learn Python
  • Python Lists
  • Python Dictionaries
  • Python Strings
  • Python Functions
  • Learn Pandas & NumPy
  • Pandas Tutorials
  • Numpy Tutorials
  • Learn Data Visualization
  • Python Seaborn
  • Python Matplotlib

Set Pandas Conditional Column Based on Values of Another Column

  • August 9, 2021 February 22, 2022

Learn how to create a pandas conditional column cover image

There are many times when you may need to set a Pandas column value based on the condition of another column. In this post, you’ll learn all the different ways in which you can create Pandas conditional columns.

Table of Contents

Video Tutorial

If you prefer to follow along with a video tutorial, check out my video below:

Loading a Sample Dataframe

Let’s begin by loading a sample Pandas dataframe that we can use throughout this tutorial.

We’ll begin by import pandas and loading a dataframe using the .from_dict() method:

This returns the following dataframe:

Using Pandas loc to Set Pandas Conditional Column

Pandas loc is incredibly powerful! If you need a refresher on loc (or iloc), check out my tutorial here . Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them.

Let’s explore the syntax a little bit:

With the syntax above, we filter the dataframe using .loc and then assign a value to any row in the column (or columns) where the condition is met.

Let’s try this out by assigning the string ‘Under 30’ to anyone with an age less than 30, and ‘Over 30’ to anyone 30 or older.

Let's take a look at what we did here:

  • We assigned the string 'Over 30' to every record in the dataframe. To learn more about this, check out my post here or creating new columns.
  • We then use .loc to create a boolean mask on the Age column to filter down to rows where the age is less than 30. When this condition is met, the Age Category column is assigned the new value 'Under 30'

But what happens when you have multiple conditions? You could, of course, use .loc multiple times, but this is difficult to read and fairly unpleasant to write. Let's see how we can accomplish this using numpy's .select() method.

Using Numpy Select to Set Values using Multiple Conditions

Similar to the method above to use .loc to create a conditional column in Pandas, we can use the numpy .select() method.

Let's begin by importing numpy and we'll give it the conventional alias np :

Now, say we wanted to apply a number of different age groups, as below:

  • <20 years old,
  • 20-39 years old,
  • 40-59 years old,
  • 60+ years old

In order to do this, we'll create a list of conditions and corresponding values to fill:

Running this returns the following dataframe:

Let's break down what happens here:

  • We first define a list of conditions in which the criteria are specified. Recall that lists are ordered meaning that they should be in the order in which you would like the corresponding values to appear.
  • We then define a list of values to use , which corresponds to the values you'd like applied in your new column.

Something to consider here is that this can be a bit counterintuitive to write. You can similarly define a function to apply different values. We'll cover this off in the section of using the Pandas .apply() method below .

One of the key benefits is that using numpy as is very fast, especially when compared to using the .apply() method.

Using Pandas Map to Set Values in Another Column

The Pandas .map() method is very helpful when you're applying labels to another column. In order to use this method, you define a dictionary to apply to the column.

For our sample dataframe, let's imagine that we have offices in America, Canada, and France. We want to map the cities to their corresponding countries and apply and "Other" value for any other city.

When we print this out, we get the following dataframe returned:

What we can see here, is that there is a NaN value associated with any City that doesn't have a corresponding country. If we want to apply "Other" to any missing values, we can chain the .fillna() method:

Using Pandas Apply to Apply a function to a column

Finally, you can apply built-in or custom functions to a dataframe using the Pandas .apply() method.

Let's take a look at both applying built-in functions such as len() and even applying custom functions.

Applying Python Built-in Functions to a Column

We can easily apply a built-in function using the .apply() method. Let's see how we can use the len() function to count how long a string of a given column.

Take note of a few things here:

  • We apply the .apply() method to a particular column,
  • We omit the parentheses "()"

Using Third-Party Packages in Pandas Apply

Similarly, you can use functions from using packages. Let's use numpy to apply the .sqrt() method to find the scare root of a person's age.

Using Custom Functions with Pandas Apply

Something that makes the .apply() method extremely powerful is the ability to define and apply your own functions.

Let's revisit how we could use an if-else statement to create age categories as in our earlier example:

In this post, you learned a number of ways in which you can apply values to a dataframe column to create a Pandas conditional column, including using .loc , .np.select() , Pandas .map() and Pandas .apply() . Each of these methods has a different use case that we explored throughout this post.

Learn more about Pandas methods covered here by checking out their official documentation:

  • Pandas Apply
  • Numpy Select

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials. View Author posts

2 thoughts on “Set Pandas Conditional Column Based on Values of Another Column”

' src=

Thank you so much! Brilliantly explained!!!

' src=

Thanks Aisha!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

How to Apply the If-Else Condition in a Pandas DataFrame

  • Python Pandas Howtos
  • How to Apply the If-Else Condition in a …

Use DataFrame.loc[] to Apply the if-else Condition in a Pandas DataFrame in Python

Use dataframe.apply() to apply the if-else condition in a pandas dataframe in python, use numpy.select() to apply the if-else condition in a pandas dataframe in python, use lambda with apply() to apply the if-else condition in a pandas dataframe in python.

How to Apply the If-Else Condition in a Pandas DataFrame

Pandas is an open-source data analysis library in Python. It provides many built-in methods to perform operations on numerical data.

In some cases, we want to apply the if-else conditions on a Pandas dataframe to filter the records or perform computations according to some conditions. Python provides many ways to use if-else on a Pandas dataframe.

loc[] is a property of the Pandas data frame used to select or filter a group of rows or columns. In the following example, we will employ this property to filter the records that meet a given condition.

Here, we have a Pandas data frame consisting of the students’ data. Using loc[] , we can only apply a single condition at a time.

We will filter those students having marks greater than or equal to 60 in the first condition and assign their result as Pass in the new column Result . Similarly, we will set Fail for the rest of the student’s results in another condition.

Example Code:

Pandas if else Using DataFrame.loc - Output

The apply() method uses the data frame’s axis (row or column) to apply a function. We can make our defined function that consists of if-else conditions and apply it to the Pandas dataframe.

Here, we have defined a function assign_Result() and applied it to the Marks column. The function consists of if-else conditions that assign the result based on the Marks and invoke this for every column row.

Pandas if else Using DataFrame.apply() - Output

We can define multiple conditions for a column in a list and their corresponding values in another list if the condition is True . The select() method takes the list of conditions and their corresponding list of values as arguments and assigns them to the Result column.

Pandas if else Using NumPy.select() - Output

A lambda is a small anonymous function consisting of a single expression. We will use lambda with apply() on the Marks column.

The x contains the marks in the lambda expression. We applied the if-else condition to the x and assigned the result accordingly in the Result column.

Pandas if else Using lambda With apply() - Output

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

Related Article - Pandas Condition

  • How to Create DataFrame Column Based on Given Condition in Pandas
  • Python Basics
  • Interview Questions
  • Python Quiz
  • Popular Packages
  • Python Projects
  • Practice Python
  • AI With Python
  • Learn Python3
  • Python Automation
  • Python Web Dev
  • DSA with Python
  • Python OOPs
  • Dictionaries
  • Pandas Exercises and Programs
  • Different ways to create Pandas Dataframe

Pandas DataFrame Practice Exercises

  • Create a Pandas DataFrame from Lists
  • Make a Pandas DataFrame with two-dimensional list | Python
  • Python | Creating DataFrame from dict of narray/lists
  • Creating Pandas dataframe using list of lists
  • Creating a Pandas dataframe using list of tuples
  • Create a Pandas DataFrame from List of Dicts
  • Python | Convert list of nested dictionary into Pandas dataframe
  • Replace values in Pandas dataframe using regex
  • Creating a dataframe from Pandas series
  • Construct a DataFrame in Pandas using string data
  • Clean the string data in the given Pandas Dataframe
  • Reindexing in Pandas DataFrame
  • Mapping external values to dataframe values in Pandas
  • Reshape a Pandas DataFrame using stack,unstack and melt method
  • Reset Index in Pandas Dataframe
  • Change column names and row indexes in Pandas DataFrame
  • How to print an entire Pandas DataFrame in Python?
  • Working with Missing Data in Pandas

Pandas Dataframe Rows Practice Exercise

  • How to iterate over rows in Pandas Dataframe
  • Different ways to iterate over rows in Pandas Dataframe
  • Selecting rows in pandas DataFrame based on conditions
  • Select any row from a Dataframe using iloc[] and iat[] in Pandas
  • Limited rows selection with given column in Pandas | Python
  • Drop rows from the dataframe based on certain condition applied on a column
  • Insert row at given position in Pandas Dataframe
  • Create a list from rows in Pandas dataframe
  • Create a list from rows in Pandas DataFrame | Set 2
  • Ranking Rows of Pandas DataFrame
  • Sorting rows in pandas DataFrame
  • Select row with maximum and minimum value in Pandas dataframe
  • Get all rows in a Pandas DataFrame containing given substring
  • Convert a column to row name/index in Pandas
  • How to randomly select rows from Pandas DataFrame

Pandas Dataframe Columns Practice Exercise

  • Create a pandas column using for loop
  • How to get column names in Pandas dataframe
  • How to rename columns in Pandas DataFrame
  • Collapse multiple Columns in Pandas
  • Get unique values from a column in Pandas DataFrame

Conditional operation on Pandas DataFrame columns

  • Return the Index label if some condition is satisfied over a column in Pandas Dataframe
  • Using dictionary to remap values in Pandas DataFrame columns
  • Formatting float column of Dataframe in Pandas
  • Create a new column in Pandas DataFrame based on the existing columns
  • Python | Creating a Pandas dataframe column based on a given condition
  • Split a column in Pandas dataframe and get part of it
  • Getting Unique values from a column in Pandas dataframe
  • Split a String into columns using regex in pandas DataFrame
  • Getting frequency counts of a columns in Pandas DataFrame
  • Change Data Type for one or more columns in Pandas Dataframe
  • Split a text column into two columns in Pandas DataFrame
  • Difference of two columns in Pandas dataframe
  • Get the index of maximum value in DataFrame column
  • Get the index of minimum value in DataFrame column
  • Get n-largest values from a particular column in Pandas DataFrame
  • Get n-smallest values from a particular column in Pandas DataFrame
  • How to drop one or multiple columns in Pandas Dataframe
  • How to lowercase strings in a column in Pandas dataframe
  • Capitalize first letter of a column in Pandas dataframe
  • Apply uppercase to a column in Pandas dataframe

Pandas Series Practice Exercise

  • Create a Pandas Series from array
  • Creating a Pandas Series from Dictionary
  • Creating a Pandas Series from Lists
  • Create Pandas Series using NumPy functions
  • Access the elements of a Series in Pandas

Pandas Date and Time Practice Exercise

  • Basic of Time Series Manipulation Using Pandas
  • Using Timedelta and Period to create DateTime based indexes in Pandas
  • Convert the column type from string to datetime format in Pandas dataframe

DataFrame String Manipulation

  • Extract punctuation from the specified column of Dataframe using Regex
  • Replace missing white spaces in a string with the least frequent character using Pandas
  • How to Convert Floats to Strings in Pandas DataFrame?

Accessing and Manipulating Data in DataFrame

  • Access Index of Last Element in pandas DataFrame in Python
  • Replace Characters in Strings in Pandas DataFrame
  • Replace values of a DataFrame with the value of another DataFrame in Pandas
  • Replace negative values with latest preceding positive value in Pandas DataFrame
  • How to add column from another DataFrame in Pandas ?

DataFrame Visualization and Exporting

  • How to render Pandas DataFrame as HTML Table?
  • Exporting Pandas DataFrame to JSON File
  • Create and display a one-dimensional array-like object using Pandas in Python
  • Export Pandas dataframe to a CSV file
  • Display the Pandas DataFrame in Heatmap style

Data Aggregation and Grouping

  • How to sum negative and positive values using GroupBy in Pandas?
  • Pandas - Groupby value counts on the DataFrame
  • How to count unique values in a Pandas Groupby object?
  • How to Add Group-Level Summary Statistic as a New Column in Pandas?
  • Find the profit and loss in the given Excel sheet using Pandas

Merging and Joining

  • Prevent duplicated columns when joining two Pandas DataFrames
  • How to Merge DataFrames of different length in Pandas ?
  • Join Pandas DataFrames matching by substring
  • Merge two Pandas DataFrames based on closest DateTime
  • Merge two Pandas DataFrames on certain columns
  • Merge two Pandas dataframes by matched ID number
  • Merge two dataframes with same column names

Filtering and Selecting Data

  • Drop specific rows from multiindex Pandas Dataframe
  • Select rows that contain specific text using Pandas

Select Rows With Multiple Filters in Pandas

  • Select Pandas dataframe rows between two dates
  • Filter Pandas Dataframe with multiple conditions

Selection and Slicing

  • How to take column-slices of DataFrame in Pandas?
  • Extract all capital words from Dataframe in Pandas
  • How to reverse the column order of the Pandas DataFrame?
  • Check if a column starts with given string in Pandas DataFrame?

Miscellaneous DataFrame Operations

  • How to display most frequent value in a Pandas series?
  • Set Pandas dataframe background Color and font color in Python
  • How to widen output display to see more columns in Pandas dataframe?
  • Get the day from a date in Pandas
  • Get the Hour from timestamp in Pandas

Data Cleaning and Manipulation

  • How to fill NAN values with mean in Pandas?
  • Fillna in multiple columns in place in Python Pandas
  • How to remove random symbols in a dataframe in Pandas?
  • Replace Negative Number by Zeros in Pandas DataFrame
  • Align columns to Left in Pandas - Python

Concatenation and Manipulation

  • Read multiple CSV files into separate DataFrames in Python
  • Append list of dictionary and series to a existing Pandas DataFrame in Python
  • Concatenate multiIndex into single index in Pandas Series
  • Concatenate strings from several rows using Pandas groupby
  • Split large Pandas Dataframe into list of smaller Dataframes

DataFrame Sorting and Reordering

  • How to Sort a Pandas DataFrame by Date?
  • Rename specific column(s) in Pandas
  • How to rename multiple column headers in a Pandas DataFrame?

DataFrame Transformation and Conversion

  • Get the first 3 rows of a given DataFrame
  • How to Convert Pandas DataFrame columns to a Series?
  • How to convert index in a column of the Pandas dataframe?
  • How to add header row to a Pandas Dataframe?

DataFrame Filtering and Selection

  • Select a single column of data as a Series in Pandas
  • How to Select single column of a Pandas Dataframe?
  • Ways to filter Pandas DataFrame by column values
  • How to Filter DataFrame Rows Based on the Date in Pandas?

DataFrame Conversion and Reshaping

  • Convert a series of date strings to a time series in Pandas Dataframe
  • Split Pandas Dataframe by Rows
  • How to convert a dictionary to a Pandas series?
  • Flatten a list of DataFrames
  • Convert birth date to age in Pandas

Suppose you have an online store. The price of the products is updated frequently. While calculating the final price on the product, you check if the updated price is available or not. If not available then you use the last price available. Solution #1: We can use conditional expression to check if the column is present or not. If it is not present then we calculate the price using the alternative column. 

conditional assignment dataframe

Please Login to comment...

Similar reads.

  • AI-ML-DS With Python
  • Python pandas-dataFrame
  • Python-pandas

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

DataFrames with Conditionals

The use of conditionals allows us to select a subset of rows based on the value in each row. Writing a conditional to select rows based on the data in a single column is straightforward and was used when we selected all of the courses taught by the Statistics department with the following code:

The subset of rows where the Subject is exactly equal to STAT (57 rows).

Complex Conditionals with Multiple Parts

As we want to answer more complex questions, we need increasingly complex conditionals. To help understand how a computer works, you may be familiar with the idea that computers ultimately only think in zeros and ones:

  • When a computer stores a zero, we consider that to be False .
  • When a computer stores a one, we consider that to be True .

When we use conditionals, we are assigning a truth value to every single row in the DataFrame.

  • With our conditional df[df.Subject == "STAT"] , all rows where the Subject data was "STAT" was assigned a truth value of True and kept in the final result; all other rows were labeled False and discarded.

All programming languages allows us to combine conditionals together in two key ways: with an AND ( & ) or with an OR ( | ).

Multiple Conditionals Joined with AND ( & )

When we combine two conditionals, we can ask Python to keep only the result where the first conditional AND the second conditional are both True .

Writing a conditional with multiple parts requires the use of parenthesis around each individual conditional and an operation joining the two conditionals together. For example, using the Course Catalog dataset , we want all of the courses that are taught by Computer Science ( CS ) with a course number less than 300:

Both the first ( Subject is exactly equal to "CS" ) and second ( Number is less than 300 ) conditionals are checked independently. Since an AND ( & ) is used to join these two conditionals, the final truth value is True only when both conditionals are True :

All CS courses with course numbers less than 300 (17 rows).

Python allows us to continue to apply conditionals together infinitely long -- so it's no problem to have three conditionals:

All CS courses with course numbers less than 300 and exactly 3 credit hours (6 rows).

Multiple Conditionals Joined with OR ( | )

Alternatively, Python can combine two conditionals together and keep the result when either the first conditional OR the second conditional is True (this includes when they're both True as well!). There are two major applications when this is useful:

  • Selecting multiple values of data from the same column (ex: all courses in "ARTD" OR "ARTE" OR "ARTF" ).
  • Selecting multiple values from different columns and keeping all matches (ex: all courses in "PSYC" OR courses that are only 1 credit hour).

Selecting Multiple Values of Data from the Same Column

Looking at the first example above, the University of Illinois has a lot of courses in art across many different sub-areas of art including: Art Design ( "ARTD" ), Art Education ( "ARTE" ), Art Foundation ( "ARTF" ), Art History ( "ARTH" ), and Art Studio ( "ARTS" ).

To include ALL courses from all five sub-areas of art listed above, we must join them together with an OR ( | ). Notice that it is necessary to specify each conditional completely each time even though we are always comparing the subject since Python has to evaluate each conditional independently and then combine the results together:

All courses in any subjects ARTD, ARTE, ARTF, ARTH, OR ARTS (221 rows).

Selecting Multiple Values from Different Columns and Keeping All Matches

To be considered a "full-time student" at most universities, you must be enrolled in at least 12 credit hours . If you are only enrolled in 11 credit hours, you may be interested in any course that will bump you up to exactly 12 credit hours (ex: a course worth exactly one credit hour) or a course you may be interested in (ex: something from the psychology ( "PSYC" ) department).

To include ALL of the results of all courses that are either one credit hour OR in the psychology department, we need an OR :

All courses that are exactly one credit hour OR in the psychology department (490 rows).

Combining ANDs and ORs

The most complex conditionals will require a combination of both AND and OR statements. These can get incredibly tricky, but we can remember that Python will always process conditionals by only combining two conditionals together at a time.

Since Python combines only two conditionals together at any given time, it is critical we use parenthesis to ensure we specify the order that we want these conditionals combined. For example, let's explore only junior level (300-399) courses in Chemistry or Physics . To do so:

  • The subject of the course must be CHEM or PHYS .
  • The course number must be greater than or equal to 300 .
  • The course number must also be less than 400 .

Naively writing this conditional results in the following code:

Default Order of Evaluation: AND before OR

If we do not use additional parenthesis, Python will always combine the ANDs first and then the ORs and will do so in left-to-right order. This means that:

The first set of two conditionals combined will be the first AND conditional: (df.Subject == "PHYS") & (df.Number >= 300) . The result contains all courses in PHYS with a number larger than 300.

The second set of two conditionals will be the result from #1 with the second AND : (Result of Step #1) & (df.Number < 400) . The result contains all courses in PHYS with a number from 300-399.

The final set of conditionals will be combined using OR : (df.Subject == "CHEM") | (Result of Step #2) . Since this is an OR , the result is ALL CHEM courses and then only the PHYS courses in the number 300-399.

We can verify our result by running the code:

The output of incorrect logic that does use parenthesis, which includes 500-level PHYS courses (92 rows).

Notice that the code appears correct until we scroll down ! The courses in Chemistry start at 300, but the last five rows show us that the courses in Physics include 500-level courses -- yikes!

Order of Evaluation: Using Parenthesis to Specify Order

Python uses parenthesis in a similar way to basic mathematics where the inner-most operations are done first. In our example, we want to make sure that all Chemistry and Physics courses are combined first, and only then can we limit the range of course numbers to the junior level.

By grouping both of these logical operations together, our new conditional can be thought of as a combination of two complex conditionals:

(df.Subject == "CHEM") | (df.Subject == "PHYS") , selecting only that are Chemistry OR Physics

(df.Number >= 300) & (df.Number < 400) , selecting only courses between 300 AND 399.

Joining these two conditionals together with an AND results in the exact output we expect:

All 300-level courses in chemistry or physics (11 rows).

Example Walk-Throughs with Worksheets

Video 1: dataframe conditionals using the party dataset.

  • Download Blank Worksheet (PDF)

Video 2: DataFrame Conditionals using The Berkeley Dataset

Video 3: DataFrame Conditionals using The Course Catalog Dataset

Practice Questions

conditional assignment dataframe

  • Write For US
  • Join for Ad Free

Pandas Create Conditional Column in DataFrame

  • Post author: Komali
  • Post category: Pandas
  • Post last modified: March 27, 2024
  • Reading time: 23 mins read

You are currently viewing Pandas Create Conditional Column in DataFrame

You can create a conditional column in pandas DataFrame by using np.where() , np.select() , DataFrame.map() , DataFrame.assign() , DataFrame.apply() , DataFrame.loc[] . Additionally, you can also use mask() method transform() and lambda functions to create single and multiple functions. In this article, I will explain several ways of how to create a conditional DataFrame column (new) with examples.

Adding a new column by conditionally checking values on existing columns is required when you would need to curate the DataFrame or derive a new column from the existing columns.

Key Points –

  • Use boolean conditions to create a new column based on specific criteria within a Pandas DataFrame.
  • Utilize the DataFrame syntax, such as df['new_column'] , to assign values to the newly created column.
  • Use NumPy’s np.where function as a concise way to apply conditional logic and assign values to the new column.
  • Use chained assignments with .loc to set values for a new column based on specific conditions, ensuring modifications are made to the original DataFrame.
  • Leverage the apply function along with a lambda function to apply conditional logic to each row of the DataFrame.
  • Handle multiple conditions by combining them with logical operators like & (AND), | (OR), and ~ (NOT) to create complex conditional expressions for the new column.

1. Quick Examples of Pandas Create Conditional DataFrame Column

If you are in a hurry, below are some quick examples of creating conditional pandas DataFrame columns.

Let’s create a pandas DataFrame with a few rows and columns and execute these examples and validate results. Our DataFrame contains column names  Courses ,  Fee and Duration .

Yields below output.

2. Create Conditional DataFrame Column by np.where() function

To create a conditional DataFrame column in pandas use np.where() function. Although pandas also have the similar where() function, Numpy.where() is very different. The difference being, where() function of Numpy provides greater flexibility and it treats the given condition differently from Pandas.

pandas where() function only allows for updating the values that do not meet the given condition. However, the where function of Numpy allows for updating values that meet and do not meet the given condition.

Another way to create a column conditionally.

Yields the same output as above.

Similarly, you can also create by using Series.map() and lambda . The lambda functions are defined using the keyword lambda . They can have any number of arguments but only one expression. These are very helpful when we have to perform small tasks with less code.

3. Create Conditional DataFrame Column by numpy.select() function

You can create a conditional DataFrame column by checking multiple columns using numpy.select() function. The select() function is more capable than the previous methods. We can use it to give a set of conditions and a set of values. Thus, we are able to assign a specific value for each condition.

When no condition matches, it assigns the default value to the new column.

4. Using DataFrame.map() Function

We can use  DataFrame.map()  function to achieve the same goal. It is a straight forward method where we use a dictionary to simply map values to the newly added column based on the key. Map  values of Series according to input correspondence. It is used for substituting each value in a Series with another value .

5. Using Dict to Create Conditional DataFrame Column

Another method to create pandas conditional DataFrame column is by creating a Dict with key-value pair. dict.get . The get() method returns the value of the item with the specified key. However, if the  key is not found when you use  dict[key] it assigns NaN .

6. Using Series.apply() Function

We can use  Series.apply()  function, when we have more than two values, in that case, we can use a dictionary to map new values onto the keys. It provides a lot of flexibility when we are having a larger number of categories for which we want to assign different values to the newly added column.

7. Using DataFrame.assign() Method

The DataFrame.assign( ) function is used to assign new columns to a  DataFrame . Returns a new object with all original columns in addition to new ones. Note that all the above examples create a new column on the existing DataFrame , this example creates a new DataFrame with the new column.

8. Using Multiple Columns by Using DataFrame.assign() Method

If you need to check multiple columns to create a new column use DataFrame.assign() function, you can see below example-

9. Using DataFrame.loc[] Method

The loc[] property is used to access a group of rows and columns by label(s) or a boolean array. The loc[] is primarily label-based, but may also be used with a boolean array. You can apply a loc[] property for a single condition-

NOTE: Alternatively, to apply loc() property for multiple conditions and create a new column in pandas DataFrame. For example

10. Using DataFrame.apply() method with lambda Function

You can also create conditional DataFrame column by using DataFrame.apply() method with lambda function. The apply  function along an axis of the  DataFrame . The lambda functions are defined using the keyword lambda .

11. Pandas Create Conditional Column Using Mask() Methods

Let’s see by using mask() method. The mask() method is used to replace values where the condition is True .

NOTE: You can replace values where the condition is false by Series.where() method. The where() method is used to check a  DataFrame  for one or more conditions and return the result accordingly.

12. Using transform() with lambda function

Finally, you can use the method  transform()  with a lambda function. The transform() function returns a self-produced DataFrame with transformed values after applying the function specified in its parameter.

Frequently Asked Questions on Pandas Create Conditional Column

To create a conditional column in Pandas, you can use boolean conditions, apply functions, or np.where to define the conditions and assign values to the new column.

You can use logical operators like & (AND), | (OR), and ~ (NOT) to combine multiple conditions and create complex expressions for the conditional column.

Boolean indexing involves using boolean conditions to filter rows in a DataFrame. By applying boolean indexing, you can selectively assign values to a new column based on specific conditions.

Using chained assignments with .loc allows you to modify the original DataFrame when creating a conditional column. This ensures that changes are made directly to the DataFrame rather than a copy.

Besides np.where , you can use other methods such as boolean indexing, apply functions with lambda expressions, or chained assignments with .loc to achieve the same goal of creating conditional columns. Choose the method that best fits your specific use case.

In this article, you have learned how Pandas create DataFrame conditional column by using np.where() , np.select() , DataFrame.apply() , DataFrame.assign() , DataFrame.map() , loc[] , mask() method, transform() and lambda functions to create single and multiple functions.

Related Articles

  • Pandas Rename Column with Examples
  • How to Merge Series into Pandas DataFrame
  • Create DataFrame From Multiple Series in Pandas
  • Pandas Operator Chaining to Filter DataFrame Rows
  • Drop Infinite Values From Pandas DataFrame
  • Pandas Create DataFrame From Dict (Dictionary)
  • Pandas Create DataFrame From List
  • How to Get Size of Pandas DataFrame?
  • How to use Pandas unstack() Function  
  • How to Union Pandas DataFrames using Concat?
  • Pandas Create New DataFrame By Selecting Specific Columns
  • Pandas Create Test and Train Samples from DataFrame
  • https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

Leave a Reply

Save my name, email, and website in this browser for the next time I comment.

Pandas – Using DataFrame.assign() method (5 examples)

Introduction.

The assign() method in Pandas is a powerful tool for adding new columns to a DataFrame in a fluent and flexible way. This method is particularly useful in data preprocessing, feature engineering, and exploratory data analysis, enabling data scientists and analysts to prepare and transform data efficiently. In this tutorial, we will explore the assign() method through five comprehensive examples, ranging from basic to more advanced use cases.

Syntax & Parameters

Pandas is a paramount library in the Python data science ecosystem, known for its versatile and high-performance data manipulation capabilities. The assign() method exemplifies these qualities by offering a dynamic approach to modify DataFrames. Before diving into examples, it’s crucial to understand the syntax of assign() :

Where **kwargs are keyword arguments in the form of column=value . Here, ‘column’ is the name of the new or existing column, and ‘value’ can be a scalar, array-like, or a callable.

Example 1: Basic Usage

Let’s begin with a basic example by creating a DataFrame and adding a new column:

This example demonstrates how to add a new column ‘C’ that is twice the value of column ‘A’.

Example 2: Using Callables

The assign() method allows for the use of callables, enhancing its flexibility. Here’s how:

This illustrates adding a new column ‘D’ by applying a lambda function that sums columns ‘A’ and ‘C’.

Example 3: Chaining Assignments

The real power of assign() shines when used in a chaining method to perform multiple operations in a single line:

This compact syntax illustrates how to sequentially add columns ‘C’ and ‘D’, showcasing the method’s efficiency in data manipulation.

Example 4: Conditional Column Creation

Now, let’s see how to add a new column based on conditions:

This demonstrates dynamically creating a new column ‘E’ that categorizes values from column ‘A’ into ‘High’ and ‘Low’ based on a condition.

Example 5: Using External Functions

Finally, let’s utilize an external function within assign() for more complex operations:

This example shows how to integrate an external function to create a new column ‘F’, further demonstrating the method’s adaptability.

This tutorial provided a thorough exploration of the assign() method in Pandas, showcasing its versatility through five practical examples. By leveraging assign() , data manipulation becomes more concise and expressive, enabling efficient and dynamic DataFrame transformations.

Next Article: Pandas: Convert a list of dicts into a DataFrame

Previous Article: Pandas – Using DataFrame.melt() method (5 examples)

Series: DateFrames in Pandas

Related Articles

  • Pandas: Remove all non-numeric elements from a Series (3 examples)
  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Use Pandas for Geospatial Data Analysis (3 examples)
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)

Search tutorials, examples, and resources

  • PHP programming
  • Symfony & Doctrine
  • Laravel & Eloquent
  • Tailwind CSS
  • Sequelize.js
  • Mongoose.js

IMAGES

  1. Pandas Create Conditional Column in DataFrame

    conditional assignment dataframe

  2. PYTHON : vectorize conditional assignment in pandas dataframe

    conditional assignment dataframe

  3. Python

    conditional assignment dataframe

  4. Apply Conditional Formatting To Whole Row In Pandas Dataframe

    conditional assignment dataframe

  5. Dataframe Python

    conditional assignment dataframe

  6. Pandas Dataframe Examples: Styling Cells and Conditional Formatting

    conditional assignment dataframe

VIDEO

  1. How to Apply Conditional Logic in Pandas DataFrame

  2. Conditional and selected signal assignment statements

  3. PySpark Transformations: df.withColumn() Use Cases & Examples #bigdata #pyspark #dataengineers

  4. Lecture -14 Assignment and Conditional operators

  5. 52. World cup Cricket

  6. Ternary Operator In C

COMMENTS

  1. 5 ways to apply an IF condition in Pandas DataFrame

    You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of ' Yes '. Otherwise, if the number is greater than 4, then assign the value of ' No '. This is the general structure that you may use to create the IF condition: Copy. df.loc[df[ 'column name'] condition, 'new column name ...

  2. Set Pandas Conditional Column Based on Values of Another Column

    With the syntax above, we filter the dataframe using .loc and then assign a value to any row in the column (or columns) where the condition is met. Let's try this out by assigning the string 'Under 30' to anyone with an age less than 30, and 'Over 30' to anyone 30 or older. df[ 'Age Category'] = 'Over 30'.

  3. vectorize conditional assignment in pandas dataframe

    Assign value in Data Frame Column without Loop-1. Functions, if statements. 0. ... conditional vectorization with numpy, How to add list to dataframe cells. 0. Replacing the value in a column based on a single vectorized condition. Hot Network Questions Disease of too-much concentration

  4. Ways to apply an if condition in Pandas DataFrame

    Let us apply IF conditions for the following situation. If the particular number is equal to or lower than 53, then assign the value of 'True'. Otherwise, if the number is greater than 53, then assign the value of 'False'. Syntax: df.loc [df ['column name'] condition, 'new column name'] = 'value if condition is met'.

  5. 5 Ways to Apply If-Else Conditional Statements in Pandas

    Image by muxin alkayis from Pixabay. Creating a new column or modifying an existing column in a Pandas data frame — based on a set of if-else conditions — is probably one of the most frequently encountered problems among all different types of data wrangling tasks. In this post, I'd like to share with you my notepad which summarizes the 5 popular ways of applying if-else conditional ...

  6. Conditional Selection and Assignment With .loc in Pandas

    First, let's just try to grab all rows in our DataFrame that match one condition. In this example, I'd just like to get all the rows that occur after a certain date, so we'll run the following code below: df1 = df.loc[df['Date'] > 'Feb 06, 2019'] And that's all! .loc allows you to set a condition and the result will be a DataFrame that ...

  7. Conditionally Create or Assign Columns on Pandas DataFrames

    Pandas .apply () Pandas .apply(), straightforward, is used to apply a function along an axis of the DataFrame or on values of Series. For example, if we have a function f that sum an iterable of numbers (i.e. can be a list, np.array, tuple, etc.), and pass it to a dataframe like below, we will be summing across a row:

  8. How to Apply the If-Else Condition in a Pandas DataFrame

    Use DataFrame.apply() to Apply the if-else Condition in a Pandas DataFrame in Python. The apply() method uses the data frame's axis (row or column) to apply a function. We can make our defined function that consists of if-else conditions and apply it to the Pandas dataframe. Here, we have defined a function assign_Result() and applied it to ...

  9. Conditional operation on Pandas DataFrame columns

    Solution #1: We can use conditional expression to check if the column is present or not. If it is not present then we calculate the price using the alternative column. Python3 # importing pandas as pd. ... Pandas DataFrame assign() Method | Create new Columns in DataFrame

  10. Set value of one Pandas column based on value in another column

    144. I need to set the value of one column based on the value of another in a Pandas dataframe. This is the logic: df['c2'] = 10. df['c2'] = df['c3'] I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).

  11. Add a Column in a Pandas DataFrame Based on an If-Else Condition

    This function takes three arguments in sequence: the condition we're testing for, the value to assign to our new column if that condition is true, and the value to assign if it is false. It looks like this: np.where(condition, value if condition is true, value if condition is false) In our data, we can see that tweets without images always ...

  12. DataFrames with Conditionals

    The first set of two conditionals combined will be the first AND conditional: (df.Subject == "PHYS") & (df.Number >= 300). The result contains all courses in PHYS with a number larger than 300. The second set of two conditionals will be the result from #1 with the second AND: (Result of Step #1) & (df.Number < 400).

  13. Pandas Create Conditional Column in DataFrame

    You can create a conditional column in pandas DataFrame by using np.where(), np.select(), DataFrame.map(), DataFrame.assign(), DataFrame.apply(), DataFrame.loc[].Additionally, you can also use mask() method transform() and lambda functions to create single and multiple functions. In this article, I will explain several ways of how to create a conditional DataFrame column (new) with examples.

  14. Pandas, conditional column assignment based on column values

    How can I have conditional assignment in pandas by based on the values of two columns? Conceptually something like the following: Column_D = Column_B / (Column_B + Column_C) if Column_C is not null else Column_C

  15. pandas.DataFrame.mask

    Notes. The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used. If the axis of other does not align with axis of cond Series/DataFrame, the misaligned index positions will be filled with True.. The signature for DataFrame.where() differs from ...

  16. Pandas

    The assign() method exemplifies these qualities by offering a dynamic approach to modify DataFrames. Before diving into examples, it's crucial to understand the syntax of assign(): DataFrame.assign(**kwargs) Where **kwargs are keyword arguments in the form of column=value. Here, 'column' is the name of the new or existing column, and ...

  17. pandas.DataFrame.assign

    Assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. Parameters: **kwargsdict of {str: callable or Series} The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns.

  18. If-else conditional assignment in pandas

    I want to assign values to a column depending on the values of an already-existing column. This code works, but I would like to do it not-in-place, perhaps using assign or apply. If this could be done in one step it would also avoid the implicit conversion from int to float that occurs below.

  19. Federal Register, Volume 89 Issue 98 (Monday, May 20, 2024)

    [Federal Register Volume 89, Number 98 (Monday, May 20, 2024)] [Rules and Regulations] [Pages 44144-44461] From the Federal Register Online via the Government Publishing Office [www.gpo.gov] [FR Doc No: 2024-08568] [[Page 44143]] Vol. 89 Monday, No. 98 May 20, 2024 Part IV Department of Labor ----- Occupational Safety and Health Administration ----- 29 CFR Part 1910 Hazard Communication ...

  20. Using conditional if/else logic with pandas dataframe columns

    Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison.You usually want to compare things using == operator. However, for None, it is recommended to use is, such as if variable is None: (...).However again, you are in a pandas/numpy environment, where there actually several values for null values (None ...

  21. Assign values to columns based on conditions in a pandas dataframe

    I think you need idxmax with numpy.where:. a = df[['A','B','C']].idxmax(axis=1) #more general solution is select all columns without first and last #a = df.iloc[:, 1:-1].idxmax(axis=1) print (df.iloc[:, 1:-1]) A B C 0 70 35 40 1 45 90 34 df['Class Change'] = np.where(df['Current Class'] == a, 'No', 'Yes') df['Recommended'] = a print (df) device_id A B C Current Class Class Change Recommended 0 ...

  22. Conditionally assign values from another column in a DataFrame

    In this method, (df == 0).mul(df.nonzero, axis=0) creates a data frame with zeros entries replaced by the values in the nonzero column and other entries zero; Combined with boolean indexing and assignment, you can conditionally modify the zero entries in the original data frame: (df == 0).mul(df.nonzero, axis=0) edited Mar 28, 2017 at 3:03.