24 Functions To Perform Exploratory Data Analysis In Python Using Pandas 2

Here is a a quick list of 24 functions to perform Exploratory Data Analysis in Python

EDA  functionsFunctionality
EDA  functionsFunctionalitySyntax ( df as data frame, col as columns,pd as pandas, sr as panda series)
unique()Returns unique value in a columndf.unique() or df[‘col’].nuinque()
nuiqueReturn number of unique valuedf=df.nuinque()
isnll(), isna()Returns number null value and number of null valuesdf.isnull(),df.isna()
hasnansReturns if there is any null value in seriessr.hasnans
sample()Random rowdf.sample(n) n= number of sample rows
infoReturns information about rows, columns, data types, indices, number of non-null values and memory useddf.info()
shapeReturns number of rows and columnsshape= df.shape
head()Returns first 5 rowsdf.head()
Tail()Returns last 5 rowsdf.tail()
sizeNumber of values in a data framedf.size
describe()This is a one of the important function giving information about statistics about the dataframe, like Mean, max,min percentile and number of non null valuesdf.describe()
dtypes()Return datatypes of column in a data framedf.dypes
Select_dtypes()It selects data types based on include or exclude keyword, if int dtype is selected it will give list of columns with int to be included and excludeddf.select_dtypes(include=None,exclude =None)
Count()Gives count of non-null values in data frame, by default return the result by column , to get row count pass axis=1 or axis =’columns’df.count() or df[‘col’].count()
Value_counts()Returns panda series not dataframes  containing count of unique values, with optional parameters(normalize,ascending,dropna)df.value_counts()
nsmallest() and nlargest()Returns first five smallest or largest values df.nsmallest(n,’col’) df.nlargest(n,’col’)
Corr()Returns pairwise correlation for column in data frames. and if used with two series returns correlation between those two series with for non-null valuesdf.corr()
Plot()Plots are used to create visualizations, the common basic supported charts are bar, barh, hist, box, area, density, pie, scatter and linedf.plot(x=’col1’, y=’col2’,kind =’bar’)
Min()Returns minimum value of each columndf.min()
Max()Returns maximum value of each columndf.max()
Mean()Returns mean of columndf.mean()
Sort_values()Sorts column based on specific column namesdf.sort_values(by=[’col’], ascending =True) df.sort_values(by=[’col’], ascending =False)
dropn()Removes specific row or columns df = df.drop()
dropna()Removes rows with null values, and returns a new data framedf = df.dropna(axis,how,tresh,subset,inplace)

✴︎ Walk

In the



Scroll to Top