Forum Discussion
Enhancing ESGai: Performance Improvements Through Parallel Processing and Structured Data Management
We're excited to announce significant performance enhancements to our ESGai solution, focusing on reducing response times and improving data organization. These upgrades represent a major step forward in our mission to deliver efficient ESG insights.
https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh
Key Improvements
Parallel Processing with Multiple Worker Agents
The most impactful change is our shift from sequential to parallel processing. Previously, sub-questions generated by the Manager agent were processed one after another by a single Worker agent, creating a bottleneck:
- Before: Sub-questions entered a queue, with total response time being the sum of all processing times
- Now: Three Worker agents operate simultaneously, each powered by its own GPT-4o model deployment
- Result: Total response time is now determined only by the longest-running sub-question rather than the sum of all processing times
This parallel architecture dramatically reduces wait times for end users, as multiple data retrievals and analyses happen concurrently rather than sequentially.
Mathematical Improvement in Response Time
We can quantify this improvement mathematically:
Let's define:
- n = number of sub-questions
- t = average time taken per sub-question
- s = time taken by the longest sub-question (where s ≤ n×t)
Before: Total processing time = n × t (sequential processing)
Now: Total processing time = s (parallel processing)
Since s is typically much smaller than n×t, this represents a substantial improvement in response time. For example, with 5 sub-questions taking an average of 10 seconds each, the previous architecture would require 50 seconds, while the new architecture might only require 15 seconds (assuming the longest sub-question takes 15 seconds).
Structured Data Organization
We've implemented a more sophisticated data organization system:
- Virtual Folders in Blob Storage: Each company now has its dedicated virtual folder containing its three document types (Sustainability Report, XBRL, and BRSR)
- Enhanced Vector Database Schema: The AI Search index now includes a defined schema that enables filtering capabilities
- Targeted Retrievals: Index filtering allows for more precise document retrieval, reducing noise in the results
Optimized Query Processing with Enhanced Schema
The Manager agent now has the ability to pass two critical variables to narrow the search scope:
- Company Name: Identifies which company's documents to search
- Document Type: Specifies which of the three document types to examine
This two-stage search process significantly improves efficiency:
- First, a keyword search identifies the relevant company and document type
- Then, vector search retrieves the most relevant text chunks from that specific document
Index Schema and Search Process
As shown in the diagram, our enhanced workflow follows a structured path:
- The Manager Agent identifies both Company Name and Document Type from the user query
- These parameters are passed to AI Search to narrow the scope
- AI Search connects to Vector Search with these filters applied
- Vector Search retrieves only the most relevant chunks from the specified documents
This targeted approach means our system no longer needs to search through the entire database for each query, resulting in faster and more accurate responses.
Impact on User Experience
These improvements deliver a significantly enhanced user experience:
- Faster Response Times: Users receive comprehensive answers in a fraction of the time
- More Relevant Results: Structured data organization and filtering lead to more precise information retrieval
- Scalable Architecture: The system is now better positioned to handle additional companies and documents as we expand
As we continue to refine ESGai with additional funding, these architectural improvements provide a solid foundation for scaling our solution to serve more companies and handle increasingly complex ESG queries.
https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh