Here is a a quick list of 24 functions to perform Exploratory Data Analysis in Python
EDA functions | Functionality | |
---|---|---|
EDA functions | Functionality | Syntax ( df as data frame, col as columns,pd as pandas, sr as panda series) |
unique() | Returns unique value in a column | df.unique() or df[‘col’].nuinque() |
nuique | Return number of unique value | df=df.nuinque() |
isnll(), isna() | Returns number null value and number of null values | df.isnull(),df.isna() |
hasnans | Returns if there is any null value in series | sr.hasnans |
sample() | Random row | df.sample(n) n= number of sample rows |
info | Returns information about rows, columns, data types, indices, number of non-null values and memory used | df.info() |
shape | Returns number of rows and columns | shape= df.shape |
head() | Returns first 5 rows | df.head() |
Tail() | Returns last 5 rows | df.tail() |
size | Number of values in a data frame | df.size |
describe() | This is a one of the important function giving information about statistics about the dataframe, like Mean, max,min percentile and number of non null values | df.describe() |
dtypes() | Return datatypes of column in a data frame | df.dypes |
Select_dtypes() | It selects data types based on include or exclude keyword, if int dtype is selected it will give list of columns with int to be included and excluded | df.select_dtypes(include=None,exclude =None) |
Count() | Gives count of non-null values in data frame, by default return the result by column , to get row count pass axis=1 or axis =’columns’ | df.count() or df[‘col’].count() |
Value_counts() | Returns panda series not dataframes containing count of unique values, with optional parameters(normalize,ascending,dropna) | df.value_counts() |
nsmallest() and nlargest() | Returns first five smallest or largest values | df.nsmallest(n,’col’) df.nlargest(n,’col’) |
Corr() | Returns pairwise correlation for column in data frames. and if used with two series returns correlation between those two series with for non-null values | df.corr() |
Plot() | Plots are used to create visualizations, the common basic supported charts are bar, barh, hist, box, area, density, pie, scatter and line | df.plot(x=’col1’, y=’col2’,kind =’bar’) |
Min() | Returns minimum value of each column | df.min() |
Max() | Returns maximum value of each column | df.max() |
Mean() | Returns mean of column | df.mean() |
Sort_values() | Sorts column based on specific column names | df.sort_values(by=[’col’], ascending =True) df.sort_values(by=[’col’], ascending =False) |
dropn() | Removes specific row or columns | df = df.drop() |
dropna() | Removes rows with null values, and returns a new data frame | df = df.dropna(axis,how,tresh,subset,inplace) |
✴︎ Walk
In the
Park
—01.03