Need Guidance on Splitting and Sequencing Code for Language (Coding Language example C#) Conversion

Copper Contributor

I'm currently working on a project that involves converting .NET Framework C# code to .NET Core WebAPI using Langchain as the framework for communication with Azure OpenAI. Our main challenge lies in dealing with large chunks of code exceeding the maximum token limit of 4000 tokens imposed by the model. While we've successfully employed techniques like chunking and using a refinement chain to process the code, we're encountering difficulties in maintaining the sequence of code lines throughout the conversion process.


The primary issue arises when we split the code into multiple chunks to stay within the token limit. While we can ensure each chunk remains under the token limit, maintaining the sequence of code lines becomes challenging. Unlike summarization tasks where the output can be condensed, in code conversion tasks, the converted code should ideally remain the same length or longer than the input code.


To address this issue, we've implemented a refinement chain where we feed the current output from the language model along with the next chunk as the next input. However, this approach isn't fully solving our problem, as the output from the model tends to reach its limit after a certain number of iterations. Since the converted code must match or exceed the length of the input code, we're struggling to find an effective solution that maintains both the sequence and length of the code during conversion.


Request for Guidance:
I'm seeking guidance from the community on how to effectively split and sequence large chunks of code for conversion while ensuring that the converted code remains coherent and maintains the original sequence of code lines. Any insights, suggestions, or alternative approaches to tackle this challenge would be greatly appreciated.


Thank you in advance for your assistance!

0 Replies