Skip to content

Features Detective App

Date of creation: 2024-10-30

Project description:
The aim of the project was to create a universal application that allows for detecting the most important features in a given data set. In short - the user uploads data or loads a ready data set in the appropriate format, then selects automatic detection of the column they want to analyze or makes this selection themselves. Finally, they receive a generated graph of the significance of features that have the greatest impact on the previously selected column. The user also receives a clear description of the graph along with recommendations - what can be improved to, for example, improve the analyzed data.

Main functionalities:
- the user can load a CSV/JSON file with data or use a ready-made sample dataset,
- the user indicates the target column -> additionally, they can use automatic column detection (generated by LLM),
- the application automatically recognizes whether the loaded data is related to the regression or classification problem and selects the appropriate AI model training algorithm on this basis,
- based on the trained model, a chart containing the most important features is displayed,
- finally, the user receives a clear description of the chart along with recommendations - what actions to implement to improve the results related to the analyzed target data column.

To train the AI ​​model I used PyCaret tools and I have included the implementation in a notebook ready for download:
Download Notebook: Model training

Skills:
- Python,
- Langfuse,
- OpenAI,
- Streamlit,
- PyCaret (Classification & Regression),
- Pandas,
- Matplotlib,
- Instructor,
- Pydantic,
- Boto3.

Sample photos:
alt text alt text alt text alt text

The application has been deployed on the Streamlit Community App and is available for public use.

Link do repository: https://github.com/kasjansmigielski/feature_detective_app
Link do app: https://feature-detective.streamlit.app/

Go to application