What data science articles gain more attraction from the readers (Part 2)

In this series of articles we are analysing historical archives of data science publications to understand what topics are more popular with the readers. Previously we covered how to get the data that will be used for further analysis.

We will cover how to clean text data we collected earlier…

Exploring ways of calculating the distance in hope to find the high-performing solution for large data sets

Euclidean distance is one of the most commonly used metric, serving as a basis for many machine learning algorithms. However when one is faced with very large data sets, containing multiple features, the simple distance calculation becomes a source of headaches and memory errors.

Although being aware that packages like…


