Collaborative problem-solving journey with one of the largest environment tech startups
This success story is about our engagement with a leading water treatment company in Europe. This award-wining innovative start up from Netherlands, is trusted by 100+ clients across 55+ countries, 10K+ sensors worldwide generating new data of 1 Million observations per day. The research teams closely work with their end customers that are either government or private bodies, responsible for managing water reservoirs or lakes of drinking water, by installing sensors at these locations and manually reviewing the sensor data at regular bases, to advise the requirement and specification of water treatments.
While they are leading in specific water treatment techniques, they wanted to deal with this problem of water contamination head-on. A proactive approach of prognosis, where the growth of contamination is forecasted weeks in advance. The benefit was two-fold. These early-signs result into direct savings for the end-customer in terms of cost of treatment, i.e. the water reservoirs, lakes of drinking water and other water sources. A simple-to-use data dashboard for the company team and the customer, where the data for specific node is loaded and ML-based predictions were charted for upcoming weeks, for them to assess and orchestrate necessary actions. With this data automation, the research teams have a possibility of reviewing the deeper data analysis, where they can observe the patterns and correlations periodically and especially monitor the deviations pre and post the treatment. This streamlined approach substantially improves the research team productivity and the insights drive customization of solutions offered to end-customers.
The approach and challenges
The first step was to identify core goals and expectations of all the stakeholders across the value chain, from the management, the customer-facing teams and end-customers themselves.
After the initial discovery phase, as we call it, we proposed a tentative roadmap on steps to take, resources required from either side, communication protocols and possible pitfalls with contingency planning. The approach is always iterative and depending on feedback factors like the definition of problem in focus, stakeholder priorities and data findings.
At TotemX, we ensure strong collaboration by regular stakeholder meetings where we present our findings, analysis, roadblocks if any and propose solutions to keep moving forward. In Data science and specifically in Machine Learning domain, it is essential to see small-wins, share the learning and keep iterating till we reach the final goal. For example, instead of beginning with large volumes of noisy data, we selected 10 candidate sites that with at least 1-year data availability. While they represented the real-world data management challenges like anomalies, missing values , skewedness, erroneous entries and large data gaps in some cases, they were mitigated thanks to thorough data quality investigation and correction techniques used by experienced data engineers at TotemX. Data acquisition + data engineering is the most effort intensive phase and critical for success of any ML lifecycle, which was well understood and appreciated by the client.
In the data analysis phase, both sides exchanged their subject matter expertise, allowing deeper discussions and verification of certain hypothesis as well as new findings about influential factors that the domain experts might not have considered earlier. Subsequently, the feature selection and engineering was carried out with the impact of newly engineered data points proven statistically.
Delivery and progress monitoring
Entire code repo was maintained and regularly reviewed by the stake holders. The algorithm selection phase involved a simple-to-complex method. Meaning, beginning with simplest algorithms e.g. statistical methods that are more compute-efficient and progressing towards Decision tree techniques like XGboost and Random forest with last tests conducted on LSTMs. These decisions were influenced by the nature of the data i.e. the time series problem, emerging patterns, volatility among other factors.
The training and validation phase saw multiple experiments designed and systemized in MLOPs. A concise way of monitoring and being able to compare the results with tangible differences helped the client’s research team gain total visibility. Such monitoring and reporting system not only enabled all the stake holders to track the progress but also brought in contributions and active buy-in across both the team.
After achieving the target metrics, we moved to the deployment stage. The data pipeline and ML models were consolidated into a single API, that was to be integrated by client’s web development team. For interim testing and demo purpose, a basic working web UI (integrated with the API service) was also built and deployed on AWS by TotemX – kind of POC where client’s management team can provide few input parameters and view the resultant analytics, charts and predictions.