Sentiment Analysis

Sentiment Analysis using Blob

This step performs sentiment analysis on the sentences prepared, adding sentiment scores and categories (positive, negative, neutral) to the DataFrame. It then visualizes the distribution of sentiments across all sentences, providing insights into the tone of the crypto news articles.

The libraries used:

textBlob: https://textblob.readthedocs.io/en/dev/

A simple NLP library that provides tools for tasks like sentiment analysis, part-of-speech tagging, and text processing.

It offers an easy-to-use interface for sentiment analysis, calculating polarity (positive/negative) and subjectivity (objective/subjective) without requiring complex model training.

Analyzes the sentiment of raw sentences using TextBlob(x) sentiment, polarity, and subjectivity.

Used in analyze_crypto_sentiment() to create a TextBlob object and compute polarity via blob.sentiment.polarity.

 Simplicity: Requires no model training, making it ideal for quick sentiment analysis compared to complex libraries like transformers.

 Accuracy: Uses a pre-trained pattern-based analyzer, effective for general English but may miss nuanced crypto-specific sentiment (e.g., "mining" as positive or negative).

 Context in Script: Without textblob, sentiment analysis would require a custom model or external API, increasing complexity. Its ease of use makes it effective for rapid prototyping and visualization.

Example: TextBlob("Dogecoin prices rose today!").sentiment returns:

- polarity: ~0.25 (positive, as "rose" suggests growth).
- subjectivity: ~0.2 (fairly objective).

Category: positive (since 0.25 > 0.1).

matplotlib.pyplot: https://matplotlib.org/3.5.3/api/_as_gen/matplotlib.pyplot.html
- Purpose: A plotting library for creating visualizations like bar charts, line graphs, etc. Used to plot sentiment distribution with plt.bar() in the main execution.

It’s widely used for quick, customizable data visualization, ideal for showing sentiment distribution.

Generates a bar chart to display the count of sentences in each sentiment category.

Visualizations: Counts all categories (e.g., 50 positive, 30 neutral, 20 negative) and plots them in a bar chart.

- Visualization Example:
- Bar chart with:
  - Green bar (positive)
  - Gray bar (neutral)
  - Red bar (negative)
- Without matplotlib, sentiment distribution would require manual counting or external software, reducing interactivity. Its use enhances insight generation from sentiment data.

Example:

import matplotlib.pyplot as plt

sentiment_counts = {"Positive": 2, "Neutral": 1, "Negative": 0}

plt.bar(
    sentiment_counts.keys(),
    sentiment_counts.values(),
    color=["green", "gray", "red"]
)

plt.show()
# Displays a bar chart with 2 positive, 1 neutral, 0 negative

Implementation of BM25

This step creates a searchable index of preprocessed sentences using the BM25 algorithm, processes user queries, and retrieves the most relevant sentences as answers. It integrates sentiment data from Step 5 and provides a ranked list of responses with metadata.

Libraries used:

rank_bm25: https://pypi.org/project/rank-bm25/

Implements the BM25 (Best Matching 25) algorithm, a ranking function used in information retrieval to score document relevance based on query terms.
BM25 is efficient and effective for text search, balancing term frequency and document length, making it ideal for retrieving relevant sentences without needing a complex machine learning model.
- BM25Okapi: Creates an index from tokenized sentences and scores them against queries. Used in CryptoAnswerRetrieval to initialize a BM25Okapi index with tokenized sentences and score them against a query via self.bm25 = BM25Okapi() and bm25.get_scores().

The process followed:

Creating the BM25 Index (Function: create_bm25_index):

Convert preprocessed sentences into a tokenized corpus.
Initialize a BM25Okapi index with the corpus.

Query Preprocessing (Function: preprocess_query):

Apply the same preprocessing to the user’s query to ensure consistency with the indexed data.

Searching with BM25 (Function: search_with_bm25):

Preprocess the query and tokenize it into words.
Compute relevance scores for all sentences using the BM25 index.
Add scores to a copy of the input DataFrame, sort by score, and filter to the top N results.
Warn if the highest score is below a threshold (0.05), indicating poor matches.
Return the top N results.

Execution:

Create the BM25 index from sentiment_df.
Test with an example query ("What is the latest trend in Bitcoin price?").
Display the top 3 results with source, score, sentiment, and raw sentence.

from rank_bm25 import BM25Okapi

sentences = [["bitcoin", "mining"], ["ethereum", "crash"]]

bm25 = BM25Okapi(sentences)

scores = bm25.get_scores(["bitcoin"])

print(scores)  # Outputs: [score_for_bitcoin_mining, score_for_ethereum_crash]

PreviousData Preprocessing NextBackend

Last updated 26 days ago

import matplotlib.pyplot as plt sentiment_counts = {"Positive": 2, "Neutral": 1, "Negative": 0} plt.bar( sentiment_counts.keys(), sentiment_counts.values(), color=["green", "gray", "red"] ) plt.show() # Displays a bar chart with 2 positive, 1 neutral, 0 negative

from rank_bm25 import BM25Okapi sentences = [["bitcoin", "mining"], ["ethereum", "crash"]] bm25 = BM25Okapi(sentences) scores = bm25.get_scores(["bitcoin"]) print(scores) # Outputs: [score_for_bitcoin_mining, score_for_ethereum_crash]