Overview

Dataset statistics

Number of variables12
Number of observations5110
Missing cells201
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 MiB
Average record size in memory351.2 B

Variable types

Numeric4
Categorical4
Boolean4

Warnings

smoking_status is highly correlated with ever_married and 1 other fieldsHigh correlation
ever_married is highly correlated with smoking_status and 1 other fieldsHigh correlation
age is highly correlated with smoking_status and 2 other fieldsHigh correlation
work_type is highly correlated with age and 1 other fieldsHigh correlation
bmi is highly correlated with work_typeHigh correlation
ever_married is highly correlated with work_typeHigh correlation
work_type is highly correlated with ever_marriedHigh correlation
bmi has 201 (3.9%) missing values Missing
id has unique values Unique

Reproduction

Analysis started2021-06-20 04:22:52.364750
Analysis finished2021-06-20 04:23:04.903754
Duration12.54 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct5110
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36517.82935
Minimum67
Maximum72940
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.0 KiB
2021-06-20T00:23:05.168891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum67
5-th percentile3590.45
Q117741.25
median36932
Q354682
95-th percentile69217.95
Maximum72940
Range72873
Interquartile range (IQR)36940.75

Descriptive statistics

Standard deviation21161.72162
Coefficient of variation (CV)0.5794901285
Kurtosis-1.21236752
Mean36517.82935
Median Absolute Deviation (MAD)18490
Skewness-0.01991297919
Sum186606108
Variance447818462.1
MonotonicityNot monotonic
2021-06-20T00:23:05.535603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
163801
 
< 0.1%
580611
 
< 0.1%
699181
 
< 0.1%
232381
 
< 0.1%
48071
 
< 0.1%
601041
 
< 0.1%
416731
 
< 0.1%
642021
 
< 0.1%
519161
 
< 0.1%
723981
 
< 0.1%
Other values (5100)5100
99.8%
ValueCountFrequency (%)
671
< 0.1%
771
< 0.1%
841
< 0.1%
911
< 0.1%
991
< 0.1%
1211
< 0.1%
1291
< 0.1%
1321
< 0.1%
1561
< 0.1%
1631
< 0.1%
ValueCountFrequency (%)
729401
< 0.1%
729181
< 0.1%
729151
< 0.1%
729141
< 0.1%
729111
< 0.1%
728821
< 0.1%
728671
< 0.1%
728611
< 0.1%
728361
< 0.1%
728241
< 0.1%

gender
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size310.4 KiB
Female
2994 
Male
2115 
Other
 
1

Length

Max length6
Median length6
Mean length5.172015656
Min length4

Characters and Unicode

Total characters26429
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowMale
2nd rowFemale
3rd rowMale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female2994
58.6%
Male2115
41.4%
Other1
 
< 0.1%

Length

2021-06-20T00:23:06.138077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-20T00:23:06.323622image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
female2994
58.6%
male2115
41.4%
other1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e8104
30.7%
a5109
19.3%
l5109
19.3%
F2994
 
11.3%
m2994
 
11.3%
M2115
 
8.0%
O1
 
< 0.1%
t1
 
< 0.1%
h1
 
< 0.1%
r1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21319
80.7%
Uppercase Letter5110
 
19.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e8104
38.0%
a5109
24.0%
l5109
24.0%
m2994
 
14.0%
t1
 
< 0.1%
h1
 
< 0.1%
r1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
F2994
58.6%
M2115
41.4%
O1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin26429
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e8104
30.7%
a5109
19.3%
l5109
19.3%
F2994
 
11.3%
m2994
 
11.3%
M2115
 
8.0%
O1
 
< 0.1%
t1
 
< 0.1%
h1
 
< 0.1%
r1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII26429
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e8104
30.7%
a5109
19.3%
l5109
19.3%
F2994
 
11.3%
m2994
 
11.3%
M2115
 
8.0%
O1
 
< 0.1%
t1
 
< 0.1%
h1
 
< 0.1%
r1
 
< 0.1%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct104
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.22661448
Minimum0.08
Maximum82
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.0 KiB
2021-06-20T00:23:06.527457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.08
5-th percentile5
Q125
median45
Q361
95-th percentile79
Maximum82
Range81.92
Interquartile range (IQR)36

Descriptive statistics

Standard deviation22.61264672
Coefficient of variation (CV)0.5231186156
Kurtosis-0.9910102432
Mean43.22661448
Median Absolute Deviation (MAD)18
Skewness-0.1370593226
Sum220888
Variance511.3317918
MonotonicityNot monotonic
2021-06-20T00:23:06.811382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
78102
 
2.0%
5795
 
1.9%
5290
 
1.8%
5487
 
1.7%
5186
 
1.7%
7985
 
1.7%
5385
 
1.7%
4585
 
1.7%
5083
 
1.6%
5583
 
1.6%
Other values (94)4229
82.8%
ValueCountFrequency (%)
0.082
 
< 0.1%
0.163
0.1%
0.245
0.1%
0.325
0.1%
0.42
 
< 0.1%
0.483
0.1%
0.565
0.1%
0.644
0.1%
0.725
0.1%
0.84
0.1%
ValueCountFrequency (%)
8256
1.1%
8160
1.2%
8070
1.4%
7985
1.7%
78102
2.0%
7742
0.8%
7650
1.0%
7553
1.0%
7440
 
0.8%
7346
0.9%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
False
4612 
True
498 
ValueCountFrequency (%)
False4612
90.3%
True498
 
9.7%
2021-06-20T00:23:07.056355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
False
4834 
True
 
276
ValueCountFrequency (%)
False4834
94.6%
True276
 
5.4%
2021-06-20T00:23:07.176013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

ever_married
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
True
3353 
False
1757 
ValueCountFrequency (%)
True3353
65.6%
False1757
34.4%
2021-06-20T00:23:07.289714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

work_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size325.7 KiB
Private
2925 
Self-employed
819 
children
687 
Govt_job
657 
Never_worked
 
22

Length

Max length13
Median length7
Mean length8.246183953
Min length7

Characters and Unicode

Total characters42138
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrivate
2nd rowSelf-employed
3rd rowPrivate
4th rowPrivate
5th rowSelf-employed

Common Values

ValueCountFrequency (%)
Private2925
57.2%
Self-employed819
 
16.0%
children687
 
13.4%
Govt_job657
 
12.9%
Never_worked22
 
0.4%

Length

2021-06-20T00:23:07.830296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-20T00:23:08.061793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
private2925
57.2%
self-employed819
 
16.0%
children687
 
13.4%
govt_job657
 
12.9%
never_worked22
 
0.4%

Most occurring characters

ValueCountFrequency (%)
e6135
14.6%
r3656
 
8.7%
i3612
 
8.6%
v3604
 
8.6%
t3582
 
8.5%
P2925
 
6.9%
a2925
 
6.9%
l2325
 
5.5%
o2155
 
5.1%
d1528
 
3.6%
Other values (16)9691
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter36217
85.9%
Uppercase Letter4423
 
10.5%
Dash Punctuation819
 
1.9%
Connector Punctuation679
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6135
16.9%
r3656
10.1%
i3612
10.0%
v3604
10.0%
t3582
9.9%
a2925
8.1%
l2325
 
6.4%
o2155
 
6.0%
d1528
 
4.2%
f819
 
2.3%
Other values (10)5876
16.2%
Uppercase Letter
ValueCountFrequency (%)
P2925
66.1%
S819
 
18.5%
G657
 
14.9%
N22
 
0.5%
Dash Punctuation
ValueCountFrequency (%)
-819
100.0%
Connector Punctuation
ValueCountFrequency (%)
_679
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin40640
96.4%
Common1498
 
3.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6135
15.1%
r3656
9.0%
i3612
8.9%
v3604
8.9%
t3582
8.8%
P2925
 
7.2%
a2925
 
7.2%
l2325
 
5.7%
o2155
 
5.3%
d1528
 
3.8%
Other values (14)8193
20.2%
Common
ValueCountFrequency (%)
-819
54.7%
_679
45.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII42138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6135
14.6%
r3656
 
8.7%
i3612
 
8.6%
v3604
 
8.6%
t3582
 
8.5%
P2925
 
6.9%
a2925
 
6.9%
l2325
 
5.5%
o2155
 
5.1%
d1528
 
3.6%
Other values (16)9691
23.0%

Residence_type
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size309.5 KiB
Urban
2596 
Rural
2514 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters25550
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUrban
2nd rowRural
3rd rowRural
4th rowUrban
5th rowRural

Common Values

ValueCountFrequency (%)
Urban2596
50.8%
Rural2514
49.2%

Length

2021-06-20T00:23:08.619531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-20T00:23:08.803919image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
urban2596
50.8%
rural2514
49.2%

Most occurring characters

ValueCountFrequency (%)
r5110
20.0%
a5110
20.0%
U2596
10.2%
b2596
10.2%
n2596
10.2%
R2514
9.8%
u2514
9.8%
l2514
9.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter20440
80.0%
Uppercase Letter5110
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r5110
25.0%
a5110
25.0%
b2596
12.7%
n2596
12.7%
u2514
12.3%
l2514
12.3%
Uppercase Letter
ValueCountFrequency (%)
U2596
50.8%
R2514
49.2%

Most occurring scripts

ValueCountFrequency (%)
Latin25550
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r5110
20.0%
a5110
20.0%
U2596
10.2%
b2596
10.2%
n2596
10.2%
R2514
9.8%
u2514
9.8%
l2514
9.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII25550
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r5110
20.0%
a5110
20.0%
U2596
10.2%
b2596
10.2%
n2596
10.2%
R2514
9.8%
u2514
9.8%
l2514
9.8%

avg_glucose_level
Real number (ℝ≥0)

Distinct3979
Distinct (%)77.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.1476771
Minimum55.12
Maximum271.74
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.0 KiB
2021-06-20T00:23:09.045478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum55.12
5-th percentile60.7135
Q177.245
median91.885
Q3114.09
95-th percentile216.2945
Maximum271.74
Range216.62
Interquartile range (IQR)36.845

Descriptive statistics

Standard deviation45.28356015
Coefficient of variation (CV)0.4266090543
Kurtosis1.68047854
Mean106.1476771
Median Absolute Deviation (MAD)17.58
Skewness1.572283867
Sum542414.63
Variance2050.60082
MonotonicityNot monotonic
2021-06-20T00:23:09.402655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93.886
 
0.1%
83.165
 
0.1%
91.855
 
0.1%
91.685
 
0.1%
735
 
0.1%
84.15
 
0.1%
72.495
 
0.1%
84.44
 
0.1%
71.064
 
0.1%
84.864
 
0.1%
Other values (3969)5062
99.1%
ValueCountFrequency (%)
55.121
< 0.1%
55.221
< 0.1%
55.231
< 0.1%
55.251
< 0.1%
55.261
< 0.1%
55.271
< 0.1%
55.281
< 0.1%
55.321
< 0.1%
55.342
< 0.1%
55.351
< 0.1%
ValueCountFrequency (%)
271.741
< 0.1%
267.761
< 0.1%
267.611
< 0.1%
267.61
< 0.1%
266.591
< 0.1%
263.561
< 0.1%
263.321
< 0.1%
261.671
< 0.1%
260.851
< 0.1%
259.631
< 0.1%

bmi
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct418
Distinct (%)8.5%
Missing201
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean28.89323691
Minimum10.3
Maximum97.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.0 KiB
2021-06-20T00:23:09.778229image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10.3
5-th percentile17.64
Q123.5
median28.1
Q333.1
95-th percentile42.96
Maximum97.6
Range87.3
Interquartile range (IQR)9.6

Descriptive statistics

Standard deviation7.85406673
Coefficient of variation (CV)0.2718306278
Kurtosis3.362659166
Mean28.89323691
Median Absolute Deviation (MAD)4.7
Skewness1.055340205
Sum141836.9
Variance61.68636419
MonotonicityNot monotonic
2021-06-20T00:23:10.105654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28.741
 
0.8%
28.438
 
0.7%
27.637
 
0.7%
26.737
 
0.7%
27.737
 
0.7%
26.137
 
0.7%
23.436
 
0.7%
27.336
 
0.7%
2735
 
0.7%
26.934
 
0.7%
Other values (408)4541
88.9%
(Missing)201
 
3.9%
ValueCountFrequency (%)
10.31
< 0.1%
11.31
< 0.1%
11.51
< 0.1%
121
< 0.1%
12.31
< 0.1%
12.81
< 0.1%
131
< 0.1%
13.21
< 0.1%
13.31
< 0.1%
13.41
< 0.1%
ValueCountFrequency (%)
97.61
< 0.1%
921
< 0.1%
781
< 0.1%
71.91
< 0.1%
66.81
< 0.1%
64.81
< 0.1%
64.41
< 0.1%
63.31
< 0.1%
61.61
< 0.1%
61.21
< 0.1%

smoking_status
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size334.9 KiB
never smoked
1892 
Unknown
1544 
formerly smoked
885 
smokes
789 

Length

Max length15
Median length12
Mean length10.08238748
Min length6

Characters and Unicode

Total characters51521
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowformerly smoked
2nd rownever smoked
3rd rownever smoked
4th rowsmokes
5th rownever smoked

Common Values

ValueCountFrequency (%)
never smoked1892
37.0%
Unknown1544
30.2%
formerly smoked885
17.3%
smokes789
15.4%

Length

2021-06-20T00:23:10.915750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-20T00:23:11.106778image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
smoked2777
35.2%
never1892
24.0%
unknown1544
19.6%
formerly885
 
11.2%
smokes789
 
10.0%

Most occurring characters

ValueCountFrequency (%)
e8235
16.0%
n6524
12.7%
o5995
11.6%
k5110
9.9%
m4451
8.6%
s4355
8.5%
r3662
7.1%
2777
 
5.4%
d2777
 
5.4%
v1892
 
3.7%
Other values (5)5743
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter47200
91.6%
Space Separator2777
 
5.4%
Uppercase Letter1544
 
3.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e8235
17.4%
n6524
13.8%
o5995
12.7%
k5110
10.8%
m4451
9.4%
s4355
9.2%
r3662
7.8%
d2777
 
5.9%
v1892
 
4.0%
w1544
 
3.3%
Other values (3)2655
 
5.6%
Space Separator
ValueCountFrequency (%)
2777
100.0%
Uppercase Letter
ValueCountFrequency (%)
U1544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin48744
94.6%
Common2777
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e8235
16.9%
n6524
13.4%
o5995
12.3%
k5110
10.5%
m4451
9.1%
s4355
8.9%
r3662
7.5%
d2777
 
5.7%
v1892
 
3.9%
U1544
 
3.2%
Other values (4)4199
8.6%
Common
ValueCountFrequency (%)
2777
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII51521
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e8235
16.0%
n6524
12.7%
o5995
11.6%
k5110
9.9%
m4451
8.6%
s4355
8.5%
r3662
7.1%
2777
 
5.4%
d2777
 
5.4%
v1892
 
3.7%
Other values (5)5743
11.1%

stroke
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
False
4861 
True
 
249
ValueCountFrequency (%)
False4861
95.1%
True249
 
4.9%
2021-06-20T00:23:11.281440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-06-20T00:22:58.247185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:22:58.642370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:22:58.967743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:22:59.292522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:22:59.617611image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:22:59.934834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:00.236968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:00.524156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:00.806949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:01.123685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:01.408268image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:01.703050image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:01.989954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:02.302901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:02.579087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-20T00:23:02.873326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-06-20T00:23:11.464035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-20T00:23:11.890634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-20T00:23:12.317315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-20T00:23:12.776699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-06-20T00:23:13.285081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-06-20T00:23:03.540668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-20T00:23:04.269414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-06-20T00:23:04.605973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idgenderagehypertensionheart_diseaseever_marriedwork_typeResidence_typeavg_glucose_levelbmismoking_statusstroke
09046Male67.0FalseTrueYesPrivateUrban228.6936.6formerly smokedTrue
151676Female61.0FalseFalseYesSelf-employedRural202.21NaNnever smokedTrue
231112Male80.0FalseTrueYesPrivateRural105.9232.5never smokedTrue
360182Female49.0FalseFalseYesPrivateUrban171.2334.4smokesTrue
41665Female79.0TrueFalseYesSelf-employedRural174.1224.0never smokedTrue
556669Male81.0FalseFalseYesPrivateUrban186.2129.0formerly smokedTrue
653882Male74.0TrueTrueYesPrivateRural70.0927.4never smokedTrue
710434Female69.0FalseFalseNoPrivateUrban94.3922.8never smokedTrue
827419Female59.0FalseFalseYesPrivateRural76.15NaNUnknownTrue
960491Female78.0FalseFalseYesPrivateUrban58.5724.2UnknownTrue

Last rows

idgenderagehypertensionheart_diseaseever_marriedwork_typeResidence_typeavg_glucose_levelbmismoking_statusstroke
510068398Male82.0TrueFalseYesSelf-employedRural71.9728.3never smokedFalse
510136901Female45.0FalseFalseYesPrivateUrban97.9524.5UnknownFalse
510245010Female57.0FalseFalseYesPrivateRural77.9321.7never smokedFalse
510322127Female18.0FalseFalseNoPrivateUrban82.8546.9UnknownFalse
510414180Female13.0FalseFalseNochildrenRural103.0818.6UnknownFalse
510518234Female80.0TrueFalseYesPrivateUrban83.75NaNnever smokedFalse
510644873Female81.0FalseFalseYesSelf-employedUrban125.2040.0never smokedFalse
510719723Female35.0FalseFalseYesSelf-employedRural82.9930.6never smokedFalse
510837544Male51.0FalseFalseYesPrivateRural166.2925.6formerly smokedFalse
510944679Female44.0FalseFalseYesGovt_jobUrban85.2826.2UnknownFalse