Welcome to Explainable Bag-of-Concepts’s documentation!
Bag-Of-Concepts (BOC) Implementation
The Bag-Of-Concepts (BOC) implementation is an advanced text processing module designed to enhance document embedding techniques by adding explainability.
In comparsion to BERTopic
c-CF-IDF normalization
Explainable AI - compatibility with SHAP
Calculate BIC, AIC using GMMs, silhouette, davies and calinski scores using a user-specified clustering method for a given list of values for K (number of concepts).
Limitations
Spherical KMeans is slow.
Cluster pollution of names in vector space (probably make 2D plots)
Not the best scores most likely due to word vectors (in comparison to the BoC)
Changelog of the project in comparsion to BoC
This project implements a flexible BoC module with automatic concept labelling using LLMs.
- Automatic Concept Labeling
The user can use our predefined prompts for OpenAI’s GPT3.5-Turbo
The user can provide his custom LangChain chain, that we invoke with the words that have to be labelled
The user can specify how many of the top N words belonging to a cluster to use
- Flexible Clustering
Spheircal KMeans (default one; used in the BoC paper)
KMeans
Spectral
Ability to encode new documents
Ability to save and load the model
Get the top N words for a concept.
Calculate BIC, AIC using GMMs, silhouette, davies and calinski scores using a user-specified clustering method for a given list of values for K (number of concepts).
- The output is compatible with SHAP values visualizations
The user can train any kind of model and use SHAP to visualize the feature importance. Examples: