Last Updated: October 3, 2020
When I first started out using Pandas Dataframe, I naively coded using standard for loops, etc to iterate through the dataframe to get the value that I wanted.
With many of these specialized structures, pandas dataframe has its own optimized iterations that you can take advantage of in order to enhance your program’s speed. For small programs, this may not be an issue but as you scale the difference in speed can range from seconds to minutes, or from hours to days.
This following article is intended to provide minor tips and tricks to enhance your speed.
Anytime, when working with any data structure, always keep in mind the difference between loading and retrieving data and the balance between the two.
General Rule of thumb: When possible, avoid for-looping dataframes. There is almost always a faster method that dataframes offer.
Get specific value if you know the column name and row index:
Get specific value if you know the column index and row index:
Get specific value if you only know the column name:
Get specific value if you only know the row index:
- (Will add in the future)
Get specific value if you only know the column index:
- (Will add in the future): df_new = df[df.iloc[1:2] == 4]
Get specific values based on specific criteria:
Get first element of same values:
Get rid of copy warning error (reset index):
Merge dataframes with same columns, stack vertically by row
Sort by two different criteria by column (if first criteria is the same):
- (Will add in the future): df = df.sort_values([‘val1’, ‘val2’], ascending=[False, False])
Sort by row:
- (Will add in the future)
Remove all NaN values from dataframe
- (Will add in the future)
Replace all NaN values from dataframe
- (Will add in the future)
Replace specific values with other values
- (Will add in the future)
Perform operation on subset of dataframe
- (Will add in the future)
Get the largest/smallest value of each row in dataframe
- (Will add in the future): df_new = df.max(axis = 1)
Get the nth-largest/nth-smallest value of each row in dataframe
- (Will add in the future)
Perform operation on specific row on dataframe
- (Will add in the future)
Perform operation on each row on dataframe
- (Will add in the future)
Add a new column, if specific condition met = value, else = other value
- (Will add in the future)
Add a new column, if many different specific condition= different values
- (Will add in the future)
Return dataframe if specific dataframe values exists in list
- (Will add in the future)
- new_df= df[df[‘val1’].isin(list(df[‘val1’]))]
Shuffle dataframe rows
df = df.sample(frac=1).reset_index(drop=True)
Further edits will be made in the future
If you want to see any additional, tips/tricks added or if you have a particularly difficult question you want answered, feel free to comment below and I will add it to this post.