Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 541909 |
| Missing cells | 136534 |
| Missing cells (%) | 3.1% |
| Duplicate rows | 5268 |
| Duplicate rows (%) | 1.0% |
| Total size in memory | 33.1 MiB |
| Average record size in memory | 64.0 B |
Variable types
| Categorical | 5 |
|---|---|
| Numeric | 3 |
Warnings
| Dataset has 5268 (1.0%) duplicate rows | Duplicates |
InvoiceNo has a high cardinality: 25900 distinct values | High cardinality |
StockCode has a high cardinality: 4070 distinct values | High cardinality |
Description has a high cardinality: 4223 distinct values | High cardinality |
InvoiceDate has a high cardinality: 23260 distinct values | High cardinality |
CustomerID has 135080 (24.9%) missing values | Missing |
UnitPrice is highly skewed (γ1 = 186.5069717) | Skewed |
Reproduction
| Analysis started | 2021-11-29 09:08:12.098007 |
|---|---|
| Analysis finished | 2021-11-29 09:08:16.494103 |
| Duration | 4.4 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
| Distinct | 25900 |
|---|---|
| Distinct (%) | 4.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.1 MiB |
| 573585 | 1114 |
|---|---|
| 581219 | 749 |
| 581492 | 731 |
| 580729 | 721 |
| 558475 | 705 |
| Other values (25895) |
Unique
| Unique | 5841 ? |
|---|---|
| Unique (%) | 1.1% |
Sample
| 1st row | 536365 |
|---|---|
| 2nd row | 536365 |
| 3rd row | 536365 |
| 4th row | 536365 |
| 5th row | 536365 |
| Value | Count | Frequency (%) |
| 573585 | 1114 | 0.2% |
| 581219 | 749 | 0.1% |
| 581492 | 731 | 0.1% |
| 580729 | 721 | 0.1% |
| 558475 | 705 | 0.1% |
| 579777 | 687 | 0.1% |
| 581217 | 676 | 0.1% |
| 537434 | 675 | 0.1% |
| 580730 | 662 | 0.1% |
| 538071 | 652 | 0.1% |
| Other values (25890) | 534537 |
| Distinct | 4070 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.1 MiB |
| 85123A | 2313 |
|---|---|
| 22423 | 2203 |
| 85099B | 2159 |
| 47566 | 1727 |
| 20725 | 1639 |
| Other values (4065) |
Unique
| Unique | 233 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 85123A |
|---|---|
| 2nd row | 71053 |
| 3rd row | 84406B |
| 4th row | 84029G |
| 5th row | 84029E |
| Value | Count | Frequency (%) |
| 85123A | 2313 | 0.4% |
| 22423 | 2203 | 0.4% |
| 85099B | 2159 | 0.4% |
| 47566 | 1727 | 0.3% |
| 20725 | 1639 | 0.3% |
| 84879 | 1502 | 0.3% |
| 22720 | 1477 | 0.3% |
| 22197 | 1476 | 0.3% |
| 21212 | 1385 | 0.3% |
| 20727 | 1350 | 0.2% |
| Other values (4060) | 524678 |
| Distinct | 4223 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 1454 |
| Missing (%) | 0.3% |
| Memory size | 4.1 MiB |
| WHITE HANGING HEART T-LIGHT HOLDER | 2369 |
|---|---|
| REGENCY CAKESTAND 3 TIER | 2200 |
| JUMBO BAG RED RETROSPOT | 2159 |
| PARTY BUNTING | 1727 |
| LUNCH BAG RED RETROSPOT | 1638 |
| Other values (4218) |
Unique
| Unique | 308 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | WHITE HANGING HEART T-LIGHT HOLDER |
|---|---|
| 2nd row | WHITE METAL LANTERN |
| 3rd row | CREAM CUPID HEARTS COAT HANGER |
| 4th row | KNITTED UNION FLAG HOT WATER BOTTLE |
| 5th row | RED WOOLLY HOTTIE WHITE HEART. |
| Value | Count | Frequency (%) |
| WHITE HANGING HEART T-LIGHT HOLDER | 2369 | 0.4% |
| REGENCY CAKESTAND 3 TIER | 2200 | 0.4% |
| JUMBO BAG RED RETROSPOT | 2159 | 0.4% |
| PARTY BUNTING | 1727 | 0.3% |
| LUNCH BAG RED RETROSPOT | 1638 | 0.3% |
| ASSORTED COLOUR BIRD ORNAMENT | 1501 | 0.3% |
| SET OF 3 CAKE TINS PANTRY DESIGN | 1473 | 0.3% |
| PACK OF 72 RETROSPOT CAKE CASES | 1385 | 0.3% |
| LUNCH BAG BLACK SKULL. | 1350 | 0.2% |
| NATURAL SLATE HEART CHALKBOARD | 1280 | 0.2% |
| Other values (4213) | 523373 | |
| (Missing) | 1454 | 0.3% |
Quantity
Real number (ℝ)
| Distinct | 722 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.552249547 |
|---|---|
| Minimum | -80995 |
| Maximum | 80995 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.1 MiB |
Quantile statistics
| Minimum | -80995 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 3 |
| Q3 | 10 |
| 95-th percentile | 29 |
| Maximum | 80995 |
| Range | 161990 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 218.0811579 |
|---|---|
| Coefficient of variation (CV) | 22.83034554 |
| Kurtosis | 119769.16 |
| Mean | 9.552249547 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.2640763071 |
| Sum | 5176450 |
| Variance | 47559.39141 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 148227 | |
| 2 | 81829 | |
| 12 | 61063 | |
| 6 | 40868 | 7.5% |
| 4 | 38484 | 7.1% |
| 3 | 37121 | 6.9% |
| 24 | 24021 | 4.4% |
| 10 | 22288 | 4.1% |
| 8 | 13129 | 2.4% |
| 5 | 11757 | 2.2% |
| Other values (712) | 63122 |
| Value | Count | Frequency (%) |
| -80995 | 1 | |
| -74215 | 1 | |
| -9600 | 2 | |
| -9360 | 1 | |
| -9058 | 1 |
| Value | Count | Frequency (%) |
| 80995 | 1 | |
| 74215 | 1 | |
| 12540 | 1 | |
| 5568 | 1 | |
| 4800 | 1 |
| Distinct | 23260 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.1 MiB |
| 2011/10/31 14:41 | 1114 |
|---|---|
| 2011/12/8 9:28 | 749 |
| 2011/12/9 10:03 | 731 |
| 2011/12/5 17:24 | 721 |
| 2011/6/29 15:58 | 705 |
| Other values (23255) |
Unique
| Unique | 4242 ? |
|---|---|
| Unique (%) | 0.8% |
Sample
| 1st row | 2010/12/1 8:26 |
|---|---|
| 2nd row | 2010/12/1 8:26 |
| 3rd row | 2010/12/1 8:26 |
| 4th row | 2010/12/1 8:26 |
| 5th row | 2010/12/1 8:26 |
| Value | Count | Frequency (%) |
| 2011/10/31 14:41 | 1114 | 0.2% |
| 2011/12/8 9:28 | 749 | 0.1% |
| 2011/12/9 10:03 | 731 | 0.1% |
| 2011/12/5 17:24 | 721 | 0.1% |
| 2011/6/29 15:58 | 705 | 0.1% |
| 2011/11/30 15:13 | 687 | 0.1% |
| 2011/12/8 9:20 | 676 | 0.1% |
| 2010/12/6 16:57 | 675 | 0.1% |
| 2011/12/5 17:28 | 662 | 0.1% |
| 2010/12/9 14:09 | 652 | 0.1% |
| Other values (23250) | 534537 |
| Distinct | 1630 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.611113626 |
|---|---|
| Minimum | -11062.06 |
| Maximum | 38970 |
| Zeros | 2515 |
| Zeros (%) | 0.5% |
| Memory size | 4.1 MiB |
Quantile statistics
| Minimum | -11062.06 |
|---|---|
| 5-th percentile | 0.42 |
| Q1 | 1.25 |
| median | 2.08 |
| Q3 | 4.13 |
| 95-th percentile | 9.95 |
| Maximum | 38970 |
| Range | 50032.06 |
| Interquartile range (IQR) | 2.88 |
Descriptive statistics
| Standard deviation | 96.75985306 |
|---|---|
| Coefficient of variation (CV) | 20.98405307 |
| Kurtosis | 59005.7191 |
| Mean | 4.611113626 |
| Median Absolute Deviation (MAD) | 1.23 |
| Skewness | 186.5069717 |
| Sum | 2498803.974 |
| Variance | 9362.469164 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.25 | 50496 | 9.3% |
| 1.65 | 38181 | 7.0% |
| 0.85 | 28497 | 5.3% |
| 2.95 | 27768 | 5.1% |
| 0.42 | 24533 | 4.5% |
| 4.95 | 19040 | 3.5% |
| 3.75 | 18600 | 3.4% |
| 2.1 | 17697 | 3.3% |
| 2.46 | 17091 | 3.2% |
| 2.08 | 17005 | 3.1% |
| Other values (1620) | 283001 |
| Value | Count | Frequency (%) |
| -11062.06 | 2 | < 0.1% |
| 0 | 2515 | |
| 0.001 | 4 | < 0.1% |
| 0.01 | 1 | < 0.1% |
| 0.03 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 38970 | 1 | < 0.1% |
| 17836.46 | 1 | < 0.1% |
| 16888.02 | 1 | < 0.1% |
| 16453.71 | 1 | < 0.1% |
| 13541.33 | 3 |
| Distinct | 4372 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 135080 |
| Missing (%) | 24.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15287.69057 |
|---|---|
| Minimum | 12346 |
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 4.1 MiB |
Quantile statistics
| Minimum | 12346 |
|---|---|
| 5-th percentile | 12626 |
| Q1 | 13953 |
| median | 15152 |
| Q3 | 16791 |
| 95-th percentile | 17905 |
| Maximum | 18287 |
| Range | 5941 |
| Interquartile range (IQR) | 2838 |
Descriptive statistics
| Standard deviation | 1713.600303 |
|---|---|
| Coefficient of variation (CV) | 0.1120902006 |
| Kurtosis | -1.179982372 |
| Mean | 15287.69057 |
| Median Absolute Deviation (MAD) | 1481 |
| Skewness | 0.02983499005 |
| Sum | 6219475867 |
| Variance | 2936426 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17841 | 7983 | 1.5% |
| 14911 | 5903 | 1.1% |
| 14096 | 5128 | 0.9% |
| 12748 | 4642 | 0.9% |
| 14606 | 2782 | 0.5% |
| 15311 | 2491 | 0.5% |
| 14646 | 2085 | 0.4% |
| 13089 | 1857 | 0.3% |
| 13263 | 1677 | 0.3% |
| 14298 | 1640 | 0.3% |
| Other values (4362) | 370641 | |
| (Missing) | 135080 | 24.9% |
| Value | Count | Frequency (%) |
| 12346 | 2 | < 0.1% |
| 12347 | 182 | |
| 12348 | 31 | < 0.1% |
| 12349 | 73 | |
| 12350 | 17 | < 0.1% |
| Value | Count | Frequency (%) |
| 18287 | 70 | < 0.1% |
| 18283 | 756 | |
| 18282 | 13 | < 0.1% |
| 18281 | 7 | < 0.1% |
| 18280 | 10 | < 0.1% |
Country
Categorical
| Distinct | 38 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.1 MiB |
| United Kingdom | |
|---|---|
| Germany | 9495 |
| France | 8557 |
| EIRE | 8196 |
| Spain | 2533 |
| Other values (33) | 17650 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | United Kingdom |
|---|---|
| 2nd row | United Kingdom |
| 3rd row | United Kingdom |
| 4th row | United Kingdom |
| 5th row | United Kingdom |
| Value | Count | Frequency (%) |
| United Kingdom | 495478 | |
| Germany | 9495 | 1.8% |
| France | 8557 | 1.6% |
| EIRE | 8196 | 1.5% |
| Spain | 2533 | 0.5% |
| Netherlands | 2371 | 0.4% |
| Belgium | 2069 | 0.4% |
| Switzerland | 2002 | 0.4% |
| Portugal | 1519 | 0.3% |
| Australia | 1259 | 0.2% |
| Other values (28) | 8430 | 1.6% |