Hi, I'm Jambo a Microsoft Student Ambassador for the Department of Applied Mathematics. First and foremost, this article is not a tutorial on regression analysis. Instead, it aims to share my thoughts on applying Phi3-vision through regression analysis. While some regression theory will be touched upon, the focus is not on the theory itself. The main goal here is to share how to use Phi3-vision, so even if you don't fully grasp the theory behind regression analysis, you'll still find this article accessible.
What is Linear Regression
Linear regression is a method used to analyze and predict data. Simply put, it tries to describe the relationship between X and Y using a straight line.
In regression theory, we can find the best-fitting line when the data meets the following conditions:
- The residuals (the distance between Y and the line) should follow a normal distribution, meaning the Y points should be evenly distributed on both sides of the line, not too far from it, and not clustered on one side.
- The variance of the residuals should be constant, indicating that the spread of the data points is consistent.
These conditions are ideal and hard to fully meet in reality. Thus, regression analysis often involves transforming the data to make it as close to these conditions as possible and finding potential variables.
Example
Here's a simple example to show the data distribution:
From the graph, we can easily draw two conclusions:
1. As X increases, Y tends to increase as well.
2. The points on the left are more concentrated than those on the right, meaning that as X increases, the range of Y also increases.
This graph does not meet the abovementioned conditions, so directly fitting a regression line to this data might not yield ideal results.
The following image shows the result of calculating the regression line without adjustments. Pay attention to the R-squared
value; the closer it is to 1, the better the model.
Through some transformation methods, we can obtain a better model. There are many theories and tests in regression analysis that determine which transformation methods to use, but real-world applications still require subjective judgment and experience.
Why I Think Phi3-vision is Suitable for Regression Analysis
Although the theory behind regression analysis is very rigorous, real-world applications often require subjective judgment. Real-world data is not as perfect as the theory suggests and is full of unpredictable variables. Therefore, regression analysis is not about finding causal relationships between data but providing a reliable way to "guess" the data.
As the saying goes, "All models are wrong." We're merely looking for a model that fits our data, which requires a lot of subjective judgment and industry experience. Phi3-vision can quickly give us some "subjective" judgments based on charts, which is very helpful in practical applications. Once we have a "subjective" judgment, we can use some tests from regression theory to verify whether this judgment is reasonable.
The Process of Regression Analysis
This is a simplified flowchart. Although it still looks complex, you don't need to fully understand it. Just know that it indicates we can break down the process, decompose a complex problem into multiple simple sub-problems, and then follow the flowchart step by step.
Implementing State Diagram-Based Regression Process with LangGraph + Phi3-vision
We can easily implement regression analysis according to the above diagram using LangGraph. Here is a state diagram automatically generated by LangGraph based on the added nodes.
If you don't have a local environment for running Phi3-vision, we can also conveniently use LangChain's Nvidia NIM integration to call the model. This way, we can quickly test whether a specific model is suitable for the application. You can find detailed information about Phi3-vision on Nvidia NIM, and after logging in, you can find your API key in the following location.
from langchain_nvidia_ai_endpoints import ChatNVIDIA
os.environ["NVIDIA_API_KEY"] = ""
llm = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
For branching routes in the nodes, we only need to ask Phi3-vision a straightforward question, like in the Constant variance
node, we would ask: "You are a data analysis expert. Does this set of data have constant variance? You only need to answer True or False."
In NIM's online test, we can see that Phi3-vision's answer to the above example is False. We just need to decide the next step based on the True or False answer.
When checking whether the data meets the normal distribution criterion, we will write a program to automatically generate a Q-Q plot (a type of chart used to determine if the data follows a normal distribution) from the data, and then ask Phi3-vision: "You are a data analysis expert. The attached figure is the Q-Q plot of this set of data. Does this set of data conform to the normal hypothesis? You only need to answer True or False."
We received a True answer. According to the flow, we know that the next step is to perform a weighted regression algorithm. However, from the graph, we can see that there are multiple weighting methods. We can list these methods and automatically generate the required charts with a program, then let Phi3-vision give us the most likely option.
In this example, Phi3-vision considers the first weighting algorithm to be the most suitable. Next, we just need to automatically jump to the corresponding algorithm function, and the remaining work is to let the program automatically calculate the result.
The specific code implementation can be found here. The following image shows the result of using LangGraph to perform regression analysis on the initial example. You can see that the R-squared
value increased from 0.408 to 0.521, and the entire process took less than 5 seconds. If the model is inferred on a locally deployed environment, the time might be even shorter.
Key Points in the Example
Decomposing Problems
For complex projects like regression analysis, no single large model can provide a complete answer in one go. Training a large model that can complete a project in a few steps (cot) is also very challenging, as it requires collecting a vast amount of complete analysis process data and possibly describing why one method is chosen over others.
Our ability to quickly and automatically solve such problems is largely because we have decomposed the problem into sufficiently detailed steps. Each node is a very simple question, and Phi3-vision only needs to answer True or False. This allows us to conveniently verify whether Phi3-vision's answers are reasonable.
Since we have decomposed the problem into sufficiently detailed steps, we can write targeted prompt texts or functions for each question, which better guides Phi3-vision's answers.
Advantages of Phi3-vision
Because the problems are decomposed into very simple steps, small models with fast responses are more practical than large models that excel at answering complex questions. Additionally, Phi3-vision has a 128k context capacity, allowing it to provide numerous reference examples when handling simple questions. Since the questions are known, we can hard-code examples, eliminating the need to wait for embedding models and vector databases to match suitable examples.
Advantages of State Graph
Regression analysis is complex not just because we need to find correlations between different factors but also because we need to continually optimize the model, transform the data, and further optimize based on the modified results. Many simple scenarios also rely heavily on loops, such as continually checking if enough information has been obtained to proceed or continually adjusting the model's parameters. Looping capability is a shortcoming of pipeline tools, but status graphs can handle this well.
Most mainstream LLM-based visualization tools are pipeline-based, such as prompt flow and langflow. When looping is required, they rely on agents or code implementation. This somewhat limits our operations and can even complicate some problems. Status graphs and pipelines are not mutually exclusive; status graphs can serve as a higher-level abstraction of pipelines, connecting pipelines that handle different tasks. And status graphs without loops can degrade into pipelines, making them a more general tool.
While loops in status graphs can cause some issues, even agents can encounter infinite loop problems. Status graphs, compared to agents, can more intuitively show the entire process, making it easier to debug and validate.
Differences from Agents
Agents allow models to autonomously decide the next step by summarizing the current information, giving the model the ability to make decisions and actively gather information. Moreover, multi-agent methods can solve complex problems, but these methods are more suited for exploring solutions when no clear solution exists. In fields with established methodologies, using Agents may seem redundant and add system complexity.
Since each step is autonomously led by the model, it needs to generate a lot of textual thought processes to make the right decision, leading to longer system runtimes, significant token consumption, and the risk of going off track. Although prompt adjustments can influence the model's decisions, debugging prompts is more of a black box compared to training models – it's hard to know which part of the model is affected by the prompts.
In the method discussed in this article, engineers pre-design the framework process, and the model only plays a role in branch selection. Automated verification ensures the model's choices are correct. The entire system is based on a state diagram, allowing us to structurally record the entire process, not just the textual output of the LLM, facilitating subsequent debugging and validation. Since the problems the model encounters are controllable and the output is merely Boolean values or branch choices, we can use examples to adjust the model's choices. Even if the model gives an incorrect result, the process remains within a predictable range, making it easy to identify issues.
Conclusion
By using Phi3-vision and LangGraph, we can break down the complex process of regression analysis into multiple smaller problems and solve them step by step. Phi3-vision provides "subjective" judgments at certain nodes, and we verify these with programs implementing the theories. Regression analysis is an iterative process, and solving small problems gradually can lead to a complete answer.
This approach not only allows us to stop at any time to review the data and intervene manually but also offers high scalability. Since Phi3-vision handles simple problems, we can easily collect enough data for fine-tuning or RAG, leveraging its advantages in multi-step analysis. Overall, Phi3-vision's rapid response and high context capacity make it well-suited for fields that are complex but have established methodologies.
Of course, this approach is not suitable for exploring solutions in uncharted territory, as we need to design the entire process upfront. However, in fields where the problems have known solutions, using an Agent might be redundant, while using Phi3-vision and state graphs is more efficient. These two approaches are not mutually exclusive; we can choose the appropriate method based on the specific situation.
This is just one of my ideas, and I hope it can inspire you. The absence of graph-based visualization tools today might be due to some considerations I haven't thought of, or perhaps I don't know enough about existing tools. If you have other ideas or suggestions, feel free to leave a comment.