Evaluation

1. OBJECTIVES

The objective of this project is to implement a robust validation mechanism for answers generated by a cryptocurrency-related question-answering system.

The goals include:

Ensuring that the answers are relevant to the specified cryptocurrency.
Detecting semantic and contextual alignment using transformer-based embeddings.
Generating a labeled dataset that identifies whether an answer is valid or invalid.
Enabling future integration into an NLP backend for real-time or batch processing of user queries.

2. DATA ACQUISITION

Input Format:

The input data is expected to be a JSON file structured with:

"coin": Name of the cryptocurrency (e.g., "bitcoin").
"query": User's question.
"answers": List of one or more answer objects, each with a "text" field.

Current Source:

The function load_data(file_path) is used to load this JSON data locally for testing.

A placeholder for live data fetching is also present in the form of a commented-out function:

python

def fetch_api(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()

This enables future scalability where queries and answers can be fetched from a web service or user interaction layer.

3. LIBRARIES USED

Core Libraries:

json, csv: For reading/writing data.
pandas: For tabular data manipulation.
requests: For potential API integration.
sentence-transformers: For semantic vector encoding using the MiniLM BERT model.
sklearn: For TF-IDF vectorization and label encoding.

Transformer Model:

model = SentenceTransformer('all-MiniLM-L6-v2')

A lightweight and efficient model capable of encoding texts into embeddings for semantic comparison.

4. TEXT EXTRACTION & VALIDATION

The function is_valid_entry(entry) is central to the model.

Steps:

Pre-check: If coin is "general crypto" → always valid.
Null Checks: If query or answers is empty → invalid.
Keyword Matching:

Checks if the coin is mentioned in the query or in any of the answers.

Semantic Matching:

Embeds coin and each answer using BERT.
Computes cosine similarity.
Threshold: 0.5.

cosine_scores = util.cos_sim(coin_embedding, answer_embeddings)[0]

similarity_valid = any(score >= threshold for score in cosine_scores)

If either keyword match or semantic score passes, the entry is valid.

5. DATA PREPROCESSING

After validation:

A CSV is created with columns:
coin, query, status (valid/invalid), answers.

The answers are preprocessed to extract the first answer text and format it as:

text: <answer_content>

This is written using:

writer.writerow([coin, query, status, answers_str])

6. VECTORIZATION (TF-IDF)

Once labeled:

Text is prepared by concatenating the coin, query, and answers fields.
The new text field is vectorized using TfidfVectorizer.

df['text'] = df['coin'] + ' ' + df['query'] + ' ' + df['answers']

status is encoded using LabelEncoder.

Final output:

vectorised.csv: Matrix of TF-IDF features with status_encoded appended.

7. INTEGRATION WITH NLP MODEL AND BACKEND

This model can be integrated with a Flask backend by:

process_query() (to be implemented)

Takes coin + query as input.
Returns validation results, e.g.:

json

{
  "coin": "bitcoin",
  "query": "What is bitcoin's trend?",
  "status": "valid"
}

Flask Route Integration

@app.route('/validate', methods=['POST'])
def validate():
    data = request.get_json()
    result = is_valid_entry(data)
    return jsonify({'status': result})

This allows real-time validation of generated answers via a REST API.

8. RESULTS

Evaluation Dataset:

After applying is_valid_entry() to each row:

63 entries evaluated.
Results saved to validation_results.csv.

TF-IDF Matrix:

Shape: (63, 129) → 63 rows, 129 unique words (features).

Label Encoding:

Valid: 1
Invalid: 0

These outputs are useful for training or analyzing a downstream classifier or performing visual diagnostics like confusion matrices.

9. STRENGTHS & LIMITATIONS

✅ Strengths:

Combines semantic and lexical checks.
Easy to extend to other domains.
Modular codebase (each step is a function).
Suitable for both batch and API use cases.

⚠️ Limitations:

Fixed threshold may underfit/overfit for certain coins or queries.
Only the first answer is evaluated from the list.
No integration yet for complex multi-turn dialogues or sentiment filtering.

10. CONCLUSION

This evaluation model effectively uses modern NLP techniques to determine the validity of answer passages in a cryptocurrency-focused QA pipeline. It supports:

Robust semantic checks using Sentence-BERT
Flexible preprocessing and vectorization
Backend-ready logic for real-time deployment

PreviousFrontend

Last updated 22 days ago

Evaluation

1. OBJECTIVES

The objective of this project is to implement a robust validation mechanism for answers generated by a cryptocurrency-related question-answering system.

The goals include:

Ensuring that the answers are relevant to the specified cryptocurrency.
Detecting semantic and contextual alignment using transformer-based embeddings.
Generating a labeled dataset that identifies whether an answer is valid or invalid.
Enabling future integration into an NLP backend for real-time or batch processing of user queries.

2. DATA ACQUISITION

Input Format:

The input data is expected to be a JSON file structured with:

"coin": Name of the cryptocurrency (e.g., "bitcoin").
"query": User's question.
"answers": List of one or more answer objects, each with a "text" field.

Current Source:

The function load_data(file_path) is used to load this JSON data locally for testing.

A placeholder for live data fetching is also present in the form of a commented-out function:

python

def fetch_api(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()

This enables future scalability where queries and answers can be fetched from a web service or user interaction layer.

3. LIBRARIES USED

Core Libraries:

json, csv: For reading/writing data.
pandas: For tabular data manipulation.
requests: For potential API integration.
sentence-transformers: For semantic vector encoding using the MiniLM BERT model.
sklearn: For TF-IDF vectorization and label encoding.

Transformer Model:

model = SentenceTransformer('all-MiniLM-L6-v2')

A lightweight and efficient model capable of encoding texts into embeddings for semantic comparison.

4. TEXT EXTRACTION & VALIDATION

The function is_valid_entry(entry) is central to the model.

Steps:

Pre-check: If coin is "general crypto" → always valid.
Null Checks: If query or answers is empty → invalid.
Keyword Matching:

Checks if the coin is mentioned in the query or in any of the answers.

Semantic Matching:

Embeds coin and each answer using BERT.
Computes cosine similarity.
Threshold: 0.5.

cosine_scores = util.cos_sim(coin_embedding, answer_embeddings)[0]

similarity_valid = any(score >= threshold for score in cosine_scores)

If either keyword match or semantic score passes, the entry is valid.

5. DATA PREPROCESSING

After validation:

A CSV is created with columns:
coin, query, status (valid/invalid), answers.

The answers are preprocessed to extract the first answer text and format it as:

text: <answer_content>

This is written using:

writer.writerow([coin, query, status, answers_str])

6. VECTORIZATION (TF-IDF)

Once labeled:

Text is prepared by concatenating the coin, query, and answers fields.
The new text field is vectorized using TfidfVectorizer.

df['text'] = df['coin'] + ' ' + df['query'] + ' ' + df['answers']

status is encoded using LabelEncoder.

Final output:

vectorised.csv: Matrix of TF-IDF features with status_encoded appended.

7. INTEGRATION WITH NLP MODEL AND BACKEND

This model can be integrated with a Flask backend by:

process_query() (to be implemented)

Takes coin + query as input.
Returns validation results, e.g.:

json

{
  "coin": "bitcoin",
  "query": "What is bitcoin's trend?",
  "status": "valid"
}

Flask Route Integration

@app.route('/validate', methods=['POST'])
def validate():
    data = request.get_json()
    result = is_valid_entry(data)
    return jsonify({'status': result})

This allows real-time validation of generated answers via a REST API.

8. RESULTS

Evaluation Dataset:

After applying is_valid_entry() to each row:

63 entries evaluated.
Results saved to validation_results.csv.

TF-IDF Matrix:

Shape: (63, 129) → 63 rows, 129 unique words (features).

Label Encoding:

Valid: 1
Invalid: 0

These outputs are useful for training or analyzing a downstream classifier or performing visual diagnostics like confusion matrices.

9. STRENGTHS & LIMITATIONS

✅ Strengths:

Combines semantic and lexical checks.
Easy to extend to other domains.
Modular codebase (each step is a function).
Suitable for both batch and API use cases.

⚠️ Limitations:

Fixed threshold may underfit/overfit for certain coins or queries.
Only the first answer is evaluated from the list.
No integration yet for complex multi-turn dialogues or sentiment filtering.

10. CONCLUSION

This evaluation model effectively uses modern NLP techniques to determine the validity of answer passages in a cryptocurrency-focused QA pipeline. It supports:

Robust semantic checks using Sentence-BERT
Flexible preprocessing and vectorization
Backend-ready logic for real-time deployment

PreviousFrontend

Last updated 22 days ago