Evaluation
1. OBJECTIVES
The objective of this project is to implement a robust validation mechanism for answers generated by a cryptocurrency-related question-answering system.
The goals include:
Ensuring that the answers are relevant to the specified cryptocurrency.
Detecting semantic and contextual alignment using transformer-based embeddings.
Generating a labeled dataset that identifies whether an answer is valid or invalid.
Enabling future integration into an NLP backend for real-time or batch processing of user queries.
2. DATA ACQUISITION
Input Format:
The input data is expected to be a JSON file structured with:
"coin": Name of the cryptocurrency (e.g., "bitcoin").
"query": User's question.
"answers": List of one or more answer objects, each with a "text" field.
Current Source:
The function load_data(file_path) is used to load this JSON data locally for testing.
A placeholder for live data fetching is also present in the form of a commented-out function:
python
This enables future scalability where queries and answers can be fetched from a web service or user interaction layer.
3. LIBRARIES USED
Core Libraries:
json, csv: For reading/writing data.
pandas: For tabular data manipulation.
requests: For potential API integration.
sentence-transformers: For semantic vector encoding using the MiniLM BERT model.
sklearn: For TF-IDF vectorization and label encoding.
Transformer Model:
A lightweight and efficient model capable of encoding texts into embeddings for semantic comparison.
4. TEXT EXTRACTION & VALIDATION
The function is_valid_entry(entry) is central to the model.
Steps:
Pre-check: If coin is "general crypto" → always valid.
Null Checks: If query or answers is empty → invalid.
Keyword Matching:
Checks if the coin is mentioned in the query or in any of the answers.
Semantic Matching:
Embeds coin and each answer using BERT.
Computes cosine similarity.
Threshold: 0.5.
If either keyword match or semantic score passes, the entry is valid.
5. DATA PREPROCESSING
After validation:
A CSV is created with columns:
coin, query, status (valid/invalid), answers.
The answers are preprocessed to extract the first answer text and format it as:
This is written using:
6. VECTORIZATION (TF-IDF)
Once labeled:
Text is prepared by concatenating the coin, query, and answers fields.
The new text field is vectorized using TfidfVectorizer.
status is encoded using LabelEncoder.
Final output:
vectorised.csv: Matrix of TF-IDF features with status_encoded appended.
7. INTEGRATION WITH NLP MODEL AND BACKEND
This model can be integrated with a Flask backend by:
process_query() (to be implemented)
Takes coin + query as input.
Returns validation results, e.g.:
json
Flask Route Integration
This allows real-time validation of generated answers via a REST API.
8. RESULTS
Evaluation Dataset:
After applying is_valid_entry() to each row:
63 entries evaluated.
Results saved to validation_results.csv.
TF-IDF Matrix:
Shape: (63, 129) → 63 rows, 129 unique words (features).
Label Encoding:
Valid: 1
Invalid: 0
These outputs are useful for training or analyzing a downstream classifier or performing visual diagnostics like confusion matrices.
9. STRENGTHS & LIMITATIONS
✅ Strengths:
Combines semantic and lexical checks.
Easy to extend to other domains.
Modular codebase (each step is a function).
Suitable for both batch and API use cases.
⚠️ Limitations:
Fixed threshold may underfit/overfit for certain coins or queries.
Only the first answer is evaluated from the list.
No integration yet for complex multi-turn dialogues or sentiment filtering.
10. CONCLUSION
This evaluation model effectively uses modern NLP techniques to determine the validity of answer passages in a cryptocurrency-focused QA pipeline. It supports:
Robust semantic checks using Sentence-BERT
Flexible preprocessing and vectorization
Backend-ready logic for real-time deployment
Last updated