Data Services: Case Studies

Case Studies

icon

Modernizing data warehousing for a healthcare software provider

The client is a major software provider for home and community-based healthcare providers and Medicaid-managed care payers. The client wanted to modernize its data warehouse with the goal of achieving near real-time data availability for reporting and analytics, improved ability to react quickly to source schema changes, and improved scalability through a cloud-based architecture. The client was seeking an end-to-end cloud solution utilizing Amazon Redshift as the data warehouse given its existing investments in Amazon Web Services (AWS) infrastructure.

Due to the high stakes of the solution selection decision, we recommended a thorough proof-of-concept and assembled a team of subject matter experts in data architecture, data engineering, and DevOps. After reviewing the current data model, we conducted an architecture review and identified and costed potential tool options. Our engineers set up a development environment and ran an eight-week proof-of-concept where we tested both historic and continuous replication data loads to the new warehouse, created infrastructure and automation scripts, and built sample dashboards. The exercise demonstrated that the client’s requirements could be best met by a combination of Amazon Web Service tools – RedShift and Data Migration Services – coupled with Talend, an Extract, Transform, and Load (ETL) tool Anoteros helped select and license on behalf of the client after testing several market-leading ETL tools.

With the client’s approval to move forward, we set up the production environment, managed the conversion of historic data loads, set up monitoring, alert, and reconciliation mechanisms, and created both automated scripts and procedural manuals for steady-state management. After the new warehouse was live, we continued to support the client team, running both the old and new data pipelines in parallel for a few weeks to ensure the new solution was performing as expected. During this phase, we also built out more robust dashboards for monitoring and trained stakeholders on how to use them, before ultimately disconnecting the old data warehouse and pipeline.

Anoteros’ approach to this complex data warehouse migration gave the client confidence to make vendor and tool selection decisions knowing it could meet their needs. Ultimately, they were able to meet their goals with the implementation of the new data solution including near-time replication from the source systems to the data warehouse. Throughout the project, we complemented the in-house data team’s understanding of the existing data model with expertise in cloud-based data warehousing and were able to guide and coach them so that when our project ended, they were able to independently operate and maintain the new environment.

icon

Care management data migration for a health plan in the US Midwest

The client, a health plan offering Medicaid plans in the Midwest, had launched the implementation of HealthEdge®’s GuidingCare® platform. Unlike its legacy care management system, GuidingCare® would allow integration with other applications, workflow automation, and importantly, the ability to easily scale and therefore support the health plan’s goal of expanding into new markets. Unfortunately, the legacy system vendor was not willing to participate in this transition and instead opted to provide raw data extracts of the client’s clinical records without documentation on the data model.

Anoteros was hired to migrate the data from the legacy system to GuidingCare®. We tapped into our deep healthcare payer experience, knowledge of care management, and expertise in data and technology. We brought together a team of data and business analysts, developers, and a project manager. The team hit the ground running, quickly learned the data model of the legacy system, and developed a data migration and archival strategy. The strategy not only included the steps to migrate data into GuidingCare® but also included steps to address the front-end and workflow impacts once the data was migrated into GuidingCare®. Our team created the data mapping specifications, developed the data migration routines, performed system testing, and facilitated the user acceptance testing of the migrated data in GuidingCare®.

The data migration was successful and supported a seamless transition between care management platforms for the end-users. The data archival strategy positioned the health plan to easily respond to future audits. Overall, the data migration allowed the client to complete the implementation project and turn their focus to expansion efforts.

icon

Automating Regulatory Intelligence and Property Address Discovery with AI for Enhanced Market Insights

Our client, a leading provider of short-term rental (STR) analytics, offers a subscription-based service that helps real estate investors, vacation rental hosts, and property managers better understand and navigate the rental market. By tracking the performance of over 10 million listings, the organization provides insights into pricing, revenue potential, and market comparisons.

The client aimed to deepen its market intelligence by leveraging Artificial Intelligence (AI) to expand its capabilities in two key areas: collecting and organizing regulatory data across targeted cities, and accurately matching property addresses across fragmented online listings. The primary challenges preventing the organization from reaching its goal were the time & labor-intensive nature of manual processes and the inability to scale given the number of cities and volume of data.

Anoteros was tasked by the client to enhance their commercial offerings by automating regulatory data collection and address discovery for listings. Our approach followed a structured, multi-phase methodology: Assess, Prototype, Review, and Implement.

Our engagement began with the Assess phase. During this initial stage, we conducted a current state assessment, looking at existing data sources, experimenting with image comparison, identifying AI/ML opportunities, evaluating new data sources, and developing a milestone plan. This involved reviewing data sources with the client’s internal SMEs and identifying markets for a Proof of Concept (POC).

Following the assessment, we moved into the Prototype phase. In this phase, Anoteros built and validated prototypes through automated solutions, leveraging learnings from the Assess phase. This involved utilizing AI/ML models and developing automated validation methods to ensure the efficacy of our solutions. Both the regulatory data and the address discovery POCs were actively prototyped during this period, incorporating AI/ML and developing validation methods.

The project then progressed to the Review phase. Here, we aggregated findings from the Prototype phase, developed recommendations for the path forward, and reviewed these findings and recommendations with the client’s leadership team. A summary of findings from the Prototype phase and an implementation plan with a proposed budget were also developed.

Requirement 1: Automating Regulatory Data Discovery & Insights for Short-Term Rentals

Business Challenge & Requirements: 
The organization sought to gather city-specific short-term rental regulations and automatically generate clear summaries of complex information such as permitting requirements, zoning rules, and other local restrictions.
The inherent challenges with this requirement included:

  • Data Fragmentation and Volatility: Regulatory information is not centralized. It is scattered across disparate sources such as municipal websites, city planning documents, public announcements and discussion forums. The frequent updates to these regulations mean any manually collected data quickly becomes obsolete.
  • Semantic Complexity: Municipal regulations are written in dense, legal language. An automated solution must be able to parse this language to accurately identify and extract specific data points such as duration limits, occupancy rules, permit fees, and zoning restrictions.
  • Scalability: Manually scaling the research process to over 100 cities is operationally and financially unfeasible. Each new city adds a significant burden on the analyst team, making a manual approach impossible to maintain at scale.
  • Trust and Reliability: A core challenge for any automated solution is establishing trust. The system must not only retrieve information but also have confidence in the authority of its sources. A significant technical hurdle is mitigating the risk of LLM “hallucinations”—where the AI generates plausible but incorrect information—to ensure data trust

Our Solution & Technical Implementation: 
To overcome these challenges, Anoteros engineered a sophisticated AI agent designed for automated data discovery and insight generation. This system was built to dynamically crawl the web and extract up-to-date regulatory information from trusted sources

Our implementation involved several key techniques:

  • An AI agent leveraging a combination of Large Language Models (LLMs), including Gemini, Claude, and models from OpenAI and Perplexity, was developed to ensure high accuracy and appropriate tone.
  • Intelligent Source Prioritization: The system leverages Google Search for real-time web discovery. To ensure source reliability, our scraping logic was designed to prioritize content from top-ranked search results and give special weight to information in Google’s featured answers, using search engine ranking as a strong proxy for source authority. This aligns with our goal of parsing only trusted content.
  • Retrieval-Augmented Generation (RAG): To ensure contextual grounding and mitigate model confabulation, we implemented a Retrieval-Augmented Generation (RAG) framework. This system dynamically retrieves the most pertinent and up to date text segments from our prioritized sources based on specific evaluation criteria (e.g., ‘maximum occupancy,’ ‘permit fees’). This retrieved content is stored/indexed and provided as a rich, factual context to the LLM, compelling it to generate answers that are strictly grounded in the verified source material.
  • Ensemble Learning for Accuracy: Rather than relying on a single model, we employed an ensemble learning technique. The system queries multiple leading LLMs simultaneously—including Gemini, Claude, OpenAI, and Perplexity models —and then intelligently synthesizes their outputs. This approach creates a more robust and accurate result than any single model could produce alone, effectively cross-verifying the information and minimizing the risk of model-specific errors or hallucinations.
  • Robust Fact-Verification: After the ensemble model generates an answer, we deployed dedicated fact-verification layers. This final step cross-references the synthesized information against the prioritized source documents to avoid hallucinations and ensure the highest degree of data trust.
  • Traceability for Human Review: To facilitate a final layer of human oversight, the system’s output explicitly includes the source URLs as citations alongside the AI-generated answers. This critical feature enables a human-in-the-loop review process, allowing the client’s analysts to quickly and easily validate the data’s origin before publication, cementing confidence in the final product.
  • Advanced Data Processing: We used semantic clustering to group similar information, avoid data duplication, and ensure comprehensive coverage of all relevant regulations. The system was also designed with flexible output formatting for seamless integration into the client’s downstream products.

Diagram 1: Regulatory Data Hybrid Approach Workflow – A flowchart illustrating the process from Google Search result ranking and scraping, feeding content to an LLM, generating tailored answers, populating a database with source links and content, and finally, manual review (Click to enlarge)

Requirement 2: Automating Address Discovery

Business Challenge & Requirements: 
The client sought a method to discover the exact street addresses of property listings found on leading short-term rental websites. This address data is a critical missing link needed to derive valuable data correlations—such as comparing a listing’s performance to local real estate values or neighborhood trends—and ultimately improve their product offerings.
The challenges associated with this requirement are significant:

  • Deliberate Data Obfuscation: STR platforms intentionally conceal exact property addresses for the privacy and security of hosts. They typically only show a general radius on a map, making direct extraction impossible.
  • Lack of a Single Source: There is no straightforward way to find an address. The solution requires a creative, multi-modal approach that can piece together clues from different data types (images, text) and external sources.
  • Scalability: The solution needed to be viable for millions of properties across hundreds of cities, demanding a highly efficient and cost-effective architecture to handle massive volumes of data processing.

Our Solution & Technical Implementation: 
We engineered a creative and highly effective automated address discovery solution. Our initial research explored several potential methods, including attempting to match property photos with Google Maps Street View imagery. However, we ultimately developed a more robust and scalable multi-step process that yielded superior results.
The core components of our final solution include:

  1. Image Acquisition & Classification: For a given listing, we first downloaded all available images. We then used OpenAI CLIP for image classification to analyze the photos and identify specific, high-value images, such as exterior shots showing the front elevation of the property.
  2. Targeted Reverse Image Search: The classified front elevation images were then used to perform a reverse image search using Google Lens. To avoid false positives, we configured the system to only consider exact image matches from the search results.
  3. Source Prioritization & Extraction: We prioritized results from reputable real estate websites (e.g., Zillow, Compass) and other sources from which an address could be reliably extracted. The system then scraped the precise street address from these high-priority pages.
  4. Geocoding & Validation: The extracted addresses were converted into precise latitude and longitude coordinates using geocoding for address conversion. We then calculated the distance between our discovered coordinates and the approximate location provided on STR websites’ listings to assess accuracy.
  5. Refined Validation Logic: To ensure the highest quality, we developed a rules-based validation system. This included automatically selecting the closest address match or flagging a listing for manual review if the calculated distance exceeded a set threshold or if search results pointed to multiple different street addresses.

This entire solution was designed to be highly scalable, capable of processing millions of properties across hundreds of cities.

Diagram 2: Address Matching Workflow – A flowchart depicting the flow from property image download, classification by Clip by OpenAI, reverse image search via Google Lens, extraction of addresses from search results, conversion to lat/long, distance calculation, and validation (Click to enlarge)

Anoteros successfully addressed the client’s two core problems, delivering automated outputs for regulatory insights and high-accuracy address matches. Anoteros helped the client to significantly reduce manual processes with scalable and highly accurate solutions.

The automation helped reduce manual costs significantly and improve operational efficiency. It also attained 90% accuracy in data validation. Today, the client is actively integrating these tools into its internal workflows, hiring a dedicated regulatory research analyst, and exploring budget expansion to further productionize and scale the address matching process. This strategic partnership has helped our client build a stronger foundation for continued growth and enhanced market intelligence, also opening up new revenue streams by enhancing product offerings.


What clients say about us