CABAS API
Tools & Technologies
ETL Techniques: Extract, Transform, Load
Data Storage: Microsoft Azure Storage Explorer
Data Warehouse: Snowflake
Visualization: Tableau Languages: Python, SQL
Situation
The CABAS API project was developed to enhance the data collection, storage, and processing capabilities for Svedea AB's car insurance business. The primary goal was to integrate data from vehicle workshops across Sweden into Svedea's data warehouse, enabling deeper insights into vehicle damage patterns and repair costs. This data was crucial for improving insurance risk assessments, refining pricing strategies, and optimizing the overall insurance process.
Task
The main tasks involved:
Extracting and integrating data from the CABAS Verkstad system, which details repair times and costs, into Svedea's Snowflake data warehouse.
Developing a robust ETL pipeline to handle the data ingestion process, ensuring that the data was accurately and efficiently processed and stored.
Providing a reliable and scalable data infrastructure that supports advanced analytics, particularly in the areas of risk assessment and pricing strategies for vehicle insurance.
Action
The project was executed in two primary phases:
Analytics Phase:
Data Structuring: Initially, raw and master tables were created for each CABAS file in Snowflake, organizing the data for detailed analysis. Views and surrogate keys (SKs) were developed to ensure accurate and efficient data extraction.
Data Loading: The structured data was loaded into Snowflake, providing a solid foundation for further analysis and integration with existing insurance data.
Pipeline Phase:
Pipeline Automation: A Python-based ETL pipeline was developed to automate the data ingestion and processing workflow. This pipeline was designed to handle large volumes of data while ensuring accuracy and efficiency.
Testing and Deployment: The pipeline was rigorously tested in a secure environment to ensure its stability and reliability. Once validated, the process was moved to production.
Monitoring: A Tableau dashboard was implemented to monitor the data loading process. It provides real-time insights into the data pipeline’s performance and highlights any issues for immediate resolution.
Result
Integrating CABAS data into Svedea's Snowflake data warehouse significantly enhanced the company’s data-driven decision-making capabilities. The key outcomes included:
Enhanced Analytics: The availability of detailed vehicle damage data allowed for advanced machine learning analyses, leading to a comprehensive overhaul of vehicle insurance pricing models.
Improved Risk Assessment: The integrated data provided deeper insights into vehicle damage patterns and repair costs, improving the accuracy of risk assessments and supporting the development of more competitive insurance products.
Operational Efficiency: The automated ETL pipeline reduced manual processing efforts, ensuring a more efficient and reliable data ingestion process.
Challenges & Solutions
Challenge: The development of the ETL pipeline encountered issues with duplicate data, which posed a risk to data integrity and analysis accuracy.
Solution: The team refined the pipeline code to include a de-duplication process, ensuring that only unique and relevant data was merged into Snowflake. Additionally, certain tables were excluded from the data warehouse due to their lack of relevance, streamlining the data processing efforts.
Impact & Contributions: The successful integration of CABAS data into Svedea’s data warehouse had a profound impact on the company’s vehicle insurance operations. It provided crucial insights that led to a complete revision of insurance tariffs, aligning them more closely with actual risk profiles. This project significantly enhanced Svedea's ability to make data-driven decisions, directly contributing to more accurate pricing strategies and improved customer satisfaction.
Conclusion:
This enhanced project description showcases your technical expertise in data integration, ETL pipeline development, and the use of modern data warehousing technologies. It highlights your ability to tackle challenges in data processing, ensure data quality, and drive impactful business outcomes—key competencies for a data engineering role.
Visuals & Samples & Link to Full Project
Visuals and links are unavailable due to the data's confidentiality and internal nature.