# Evaluation

1\. **OBJECTIVES**

The objective of this project is to implement a robust validation mechanism for answers generated by a cryptocurrency-related question-answering system.

The goals include:

* Ensuring that the answers are relevant to the specified cryptocurrency.
* Detecting semantic and contextual alignment using transformer-based embeddings.
* Generating a labeled dataset that identifies whether an answer is valid or invalid.
* Enabling future integration into an NLP backend for real-time or batch processing of user queries.

2\. **DATA ACQUISITION**

Input Format:

The input data is expected to be a JSON file structured with:

* "coin": Name of the cryptocurrency (e.g., "bitcoin").
* "query": User's question.
* "answers": List of one or more answer objects, each with a "text" field.

Current Source:

The function load\_data(file\_path) is used to load this JSON data locally for testing.

A placeholder for live data fetching is also present in the form of a commented-out function:

python

```
def fetch_api(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()

```

This enables future scalability where queries and answers can be fetched from a web service or user interaction layer.

3\. **LIBRARIES USED**

Core Libraries:

* json, csv: For reading/writing data.
* pandas: For tabular data manipulation.
* requests: For potential API integration.
* sentence-transformers: For semantic vector encoding using the MiniLM BERT model.
* sklearn: For TF-IDF vectorization and label encoding.

**Transformer Model:**

```
model = SentenceTransformer('all-MiniLM-L6-v2')

```

A lightweight and efficient model capable of encoding texts into embeddings for semantic comparison.

4\. **TEXT EXTRACTION & VALIDATION**

The function is\_valid\_entry(entry) is central to the model.

**Steps:**

1. **Pre-check:** If coin is "general crypto" → always valid.
2. **Null Checks:** If query or answers is empty → invalid.
3. **Keyword Matching:**

* Checks if the coin is mentioned in the query or in any of the  answers.

4. **Semantic Matching:**

* Embeds coin and each answer using BERT.
* Computes cosine similarity.
* Threshold: 0.5.

```
cosine_scores = util.cos_sim(coin_embedding, answer_embeddings)[0]

similarity_valid = any(score >= threshold for score in cosine_scores)

```

If either keyword match or semantic score passes, the entry is valid.

5\. **DATA PREPROCESSING**

After validation:

* A CSV is created with columns:
* coin, query, status (valid/invalid), answers.

The answers are preprocessed to extract the first answer text and format it as:

```
text: <answer_content>

```

This is written using:

```
writer.writerow([coin, query, status, answers_str])

```

6\. **VECTORIZATION (TF-IDF)**

Once labeled:

* Text is prepared by concatenating the coin, query, and answers fields.
* The new text field is vectorized using TfidfVectorizer.

```
df['text'] = df['coin'] + ' ' + df['query'] + ' ' + df['answers']

```

* status is encoded using LabelEncoder.

Final output:

* vectorised.csv: Matrix of TF-IDF features with status\_encoded appended.

7\. **INTEGRATION WITH NLP MODEL AND BACKEND**

This model can be integrated with a Flask backend by:

**process\_query() (to be implemented)**

* Takes coin + query as input.
* Returns validation results, e.g.:

json

```
{
  "coin": "bitcoin",
  "query": "What is bitcoin's trend?",
  "status": "valid"
}

```

**Flask Route Integration**

```
@app.route('/validate', methods=['POST'])
def validate():
    data = request.get_json()
    result = is_valid_entry(data)
    return jsonify({'status': result})

```

This allows real-time validation of generated answers via a REST API.

8\. **RESULTS**

**Evaluation Dataset:**

After applying is\_valid\_entry() to each row:

* 63 entries evaluated.
* Results saved to validation\_results.csv.

**TF-IDF Matrix:**

* Shape: (63, 129) → 63 rows, 129 unique words (features).

**Label Encoding:**

* Valid: 1
* Invalid: 0

These outputs are useful for training or analyzing a downstream classifier or performing visual diagnostics like confusion matrices.

9\. **STRENGTHS & LIMITATIONS**

✅ **Strengths:**

* Combines **semantic** **and** **lexical** checks.
* Easy to extend to other domains.
* Modular codebase (each step is a function).
* Suitable for both **batch** and **API** use cases.

⚠️ Limitations:

* Fixed threshold may underfit/overfit for certain coins or queries.
* Only the **first answe**r is evaluated from the list.
* No integration yet for complex multi-turn dialogues or sentiment filtering.

10\. **CONCLUSION**

This evaluation model effectively uses **modern NLP techniques** to determine the validity of answer passages in a cryptocurrency-focused QA pipeline. It supports:

* **Robust semantic checks** using Sentence-BERT
* **Flexible preprocessing and vectorization**
* **Backend-ready logic** for real-time deployment

***

***
