**Introduction**

Document similarities is one of the most crucial problems of NLP. Finding similarity across documents is used in several domains such as recommending similar books and articles, identifying plagiarised documents, legal documents, etc.

We can call two documents similar if they are semantically similar and define the same concept…

Hypothesis testing or A/B testing is a crucial step before integrating a data science solution or any feature update into the product. This is basically a statistical way of measuring the impact of the feature we are trying to add/update in the product.

According to Wikipedia,

*A/B testing is a…*

**What is cosine similarity?**

Cosine similarity measures the similarity between two vectors by calculating the cosine of the angle between the two vectors.

Cosine similarity is one of the most widely used and powerful similarity measure in Data Science. It is used in multiple applications such as finding similar documents…

**Introduction**

tf-idf, which stands for term frequency-inverse document frequency is used to calculate a quantitative digest of any document, which can be further used to find similar documents, classification of documents, etc.

This article will explain tf-idf, it’s variations and what is the impact of these variations on the model…

Principal Data Scientist | Machine Learning | Deep Learning | NLP | www.linkedin.com/in/varun21290