SOLVED

Convolution of a vector and a kernel with offsetting using dynamic arrays

Copper Contributor

Hey guys, hope you doing great!

I have been playing with some data at work and wondered what would happen if i made some kernel to convolve with my main vector to smooth it with a moving-average like operation as shown in this awesome video by Grant at 3Blue1Brown


I started searching the forum for some solution to this and came by some made by @Peter Bartholomew while discussing about Fourier Transforms, but i found the solution to be too complicated for my use as it needed recursion and a lot of defined names, so i went for my own solution which involved offsetting the vector while the kernel stays which is the opposite of what is shown in the video.

The first thing i thought was that i needed a sequence that could be used to offset the vector so i could use this offsetted vector to SUMPRODUCT with the kernel and since the size of the convolution must be:
vector + kernel - 1 by definition

i made a formula to make this sequence:

 

=SEQUENCE(
    ROWS(vector) + ROWS(kernel) - 1;
    1;
    -(ROWS(kernel) - 1);
    1
)

 

*im using a vector of 100 rows and a kernel of 4 rows*

This sequence then starts at -3 and goes to 99 one by one thus obeying convolution size definition, but why do i need it in the first place? so i can generate an array of height 4 (kernel size) which offsets the vector taking the sequence as its row offset argument:

 

=TRANSPOSE(OFFSET(vector;A4;0;4))

 

where A4 is the first result of the sequence formula, now i can drag down the formula for all numbers in the sequence and it will generate the output i need to transpose again and then sumproduct:

 

=SUMPRODUCT(TRANSPOSE(F4#);kernel)

 

where F4# is the output of the transpose offset formula above and then i drag down for each result and voila the result of the convolution is done (assuming the kernel was already flipped)

Now to make it dynamic i first made a formula that takes the vector and the kernel and automatically generates the sequence:

 

=LAMBDA(vector1; kernel1;
    LET(
        vector_r; ROWS(vector1);
        kernel_r; ROWS(kernel1);
        convolution_r; ((vector_r) + (kernel_r) - 1);
        offset_sequence; SEQUENCE(convolution_r;1;-(kernel_r - 1));
        offset_sequence
    )
)(vector; kernel)

 

 And a second formula to take the results:

 

=MAP(
    M4#;
    LAMBDA(seq;
        SUMPRODUCT(
            OFFSET(
                vector;
                seq;
                0;
                ROWS(kernel)
            );
            kernel
        )
    )
)

 

where M4# is the result of the first dynamic formula

This does the trick but has space for improvement, first of all the vector needs space to be pad with zeroes which should be handled inside the formula but i couldnt do it.

Second, i should be able to combine the two formulas at least in my head it should be possible i tried to do it but dindnt work... ill be leaving my test file to show it more properly what i have done

The goal here is to learn how to properly pad arrays for this kind of offsetting and how to properly combine these formulas if possible, and, if this is not proper at all i would like to know...

Also note: i use semicolon instead of commas at the formula because i live in Brazil and the formula syntax here works like that for some reason...

2 Replies
best response confirmed by MarkolaKrai (Copper Contributor)
Solution

@MarkolaKrai 

The video to which you provided the link is very good.  I had watched others by the same presenter.

 

"too complicated for my use as it needed recursion"

I try to avoid recursion where possible but it can be valuable, especially when it is not possible to predict in advance how many steps will be needed.  I have a non-recursive version of the convolution but it is no simpler.

 

"and a lot of defined names",  I have avoided direct cell referencing for the past 8 years, preferring instead to use names both for identifying ranges and to define formulas for evaluation.  There are plusses, I no longer need to use relative references, so a formula will evaluate to the same result wherever I place it, and I no longer need to worry about following precedents to help decode a formula.

 

"so I went for my own solution which involved offsetting the vector while the kernel stays which is the opposite of what is shown in the video"  One learns more by going one's own way!  The convolution has an inherent symmetry allowing the role of the value array and the kernel to be reversed.

 

I did try using INDEX in place of OFFSET in case the input range was being dereferenced to produce an array rather than a reference but without success.  Sorry about that.

 

An advantage of Lambda functions is that they are not as difficult to deploy as they are to develop!   The first file attached has links to the theoretical basis, which is interesting but something of a nightmare nonetheless.  I also applied the Convolve function to a ModelOff problem recently, as presented by @Diarmuid Early .  The links to his problem solution show how traditional techniques may be used, especially if you are a World-class Excel modeller!

@Peter Bartholomew 

Hello Peter, it's been a while since I sent this post, and on top of that, you were the only one who responded. Thank you very much for that. Anyway, today I can say that I've reached a conclusion for a single formula that performs convolution without the need for recursion or complex mathematics.

Unfortunately, the formula turned out to be quite intimidating and probably way slower, but at least I can say that I've learned new concepts like the single function and lambda thunking.

CONVOLVE = LAMBDA(vector1; vector2;
    LET(
        prematrix; vector1 * TRANSPOSE(vector2);
        MATRIX; EXPAND(prematrix; ROWS(prematrix); (MAX(COLUMNS(prematrix);ROWS(prematrix))-1) * 2; 0);
        out;TRANSPOSE(
            BYCOL(
                IFERROR(
                    INDEX(
                        MATRIX;
                        SEQUENCE(ROWS(MATRIX));
                        EXPAND_THUNK_H(
                            BYCOL(
                                SEQUENCE(; COLUMNS(MATRIX));
                                LAMBDA(a; THUNK(REVERSE_VERTICAL(SEQUENCE(@a; 1; 1; 1))))
                            )
                        )
                    );
                    ""
                );
                LAMBDA(a; SUM(a))
            )
        )
    ;INDEX(out;SEQUENCE(ROWS(vector1)+ROWS(vector2)-1)))
)

where:

REVERSE_VERTICAL = LAMBDA(array;
    LET(sequencia; SEQUENCE(ROWS(array)); SORTBY(array; sequencia; -1))
);;

THUNK = LAMBDA(X;LAMBDA(X));;

EXPAND_THUNK_H = LAMBDA(arraythunks;
    IFERROR(DROP(REDUCE(0; arraythunks; LAMBDA(init; arraythunks; HSTACK(init; arraythunks())));;1); "")
);;

Which are formulas that i have learned from the big ones around here at the communitty thank you all very much, especially @tboulden @Sergei Baklan @aaltomani @Peter Bartholomew 

Anyway, i can now say that i am decided to move on and use the solution you provided which is far more complex to me (for now at least) but just works better.

1 best response

Accepted Solutions
best response confirmed by MarkolaKrai (Copper Contributor)
Solution

@MarkolaKrai 

The video to which you provided the link is very good.  I had watched others by the same presenter.

 

"too complicated for my use as it needed recursion"

I try to avoid recursion where possible but it can be valuable, especially when it is not possible to predict in advance how many steps will be needed.  I have a non-recursive version of the convolution but it is no simpler.

 

"and a lot of defined names",  I have avoided direct cell referencing for the past 8 years, preferring instead to use names both for identifying ranges and to define formulas for evaluation.  There are plusses, I no longer need to use relative references, so a formula will evaluate to the same result wherever I place it, and I no longer need to worry about following precedents to help decode a formula.

 

"so I went for my own solution which involved offsetting the vector while the kernel stays which is the opposite of what is shown in the video"  One learns more by going one's own way!  The convolution has an inherent symmetry allowing the role of the value array and the kernel to be reversed.

 

I did try using INDEX in place of OFFSET in case the input range was being dereferenced to produce an array rather than a reference but without success.  Sorry about that.

 

An advantage of Lambda functions is that they are not as difficult to deploy as they are to develop!   The first file attached has links to the theoretical basis, which is interesting but something of a nightmare nonetheless.  I also applied the Convolve function to a ModelOff problem recently, as presented by @Diarmuid Early .  The links to his problem solution show how traditional techniques may be used, especially if you are a World-class Excel modeller!

View solution in original post