Due to performance and locking reasons, changing a datatype column using ALTER COLUMN can be a long-running operation.
Suppose we have a table PRU with two columns. One is a column called id with type bigserial. In the second column called A we have integer data currently saved as Text type. Let's say we want to change the type of column A to Integer.
If you'd like to follow along with an example of this scenario, let's first create a table and generate data for it.
CREATE TABLE PRU (id bigserial, A TEXT); INSERT INTO PRU (A) VALUES ('111'); INSERT INTO PRU (A) VALUES ('111');
Generate rows until 2M, by looping the following statement:
INSERT INTO PRU SELECT * FROM PRU;
We want to change column A's datatype:
ALTER /*optionA*/ TABLE PRU ALTER COLUMN A TYPE INTEGER USING A::INTEGER;
We could review stats from the command above with following query:
SELECT * FROM pg_stat_statements WHERE query like '%optionA%';
This method is the easiest one, but could generate high contention due to required exclusive lock for the table. This exclusive lock could generate errors in the application. You might have to stop your application to perform this type of long running operation.
Another approach to change the datatype of the column could be to
The advantages of this method is you have more control over the process. It can be executed over multiple hours or days as needed.
// // Add 2 auxiliary columns // ALTER TABLE PRU ADD COLUMN A1 INTEGER, ADD COLUMN A1_CHANGED BOOLEAN; // // Trigger to take care of ongoing changes from the applications // CREATE OR REPLACE FUNCTION set_a1() RETURNS TRIGGER AS $func$ BEGIN IF (TG_OP='INSERT') THEN NEW.a1:=NEW.a::integer; ELSEIF (TG_OP='UPDATE') THEN IF (NEW.a <> OLD.a) THEN NEW.a1:=NEW.a::integer; ELSEIF (NEW.a is null and OLD.a is not null) THEN NEW.a1:=null; ELSEIF (NEW.a is not null and OLD.a is null) THEN NEW.a1:=NEW.a::integer; END IF; END IF; NEW.a1_changed:=true; RETURN NEW; END $func$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS set_a1 ON PRU; CREATE TRIGGER set_a1 BEFORE INSERT OR UPDATE ON pru FOR EACH ROW EXECUTE PROCEDURE set_a1(); // // Update sentence with limit number or fows to update in single transaction // // This sentence must be repeated multiple times until all rows were updated // UPDATE /*optionB*/ PRU SET A1=A::INTEGER, A1_CHANGED=true WHERE id IN (SELECT id FROM PRU WHERE A1_CHANGED is null limit 100000); // // Check the process // // Current changed rows: SELECT COUNT(1) FROM PRU WHERE A1_CHANGED=true; // Current pending rows: SELECT COUNT(1) FROM PRU WHERE A1_CHANGED is null; // // Final work // // After no rows need changes, we can switch the columns BEGIN WORK; LOCK TABLE PRU IN SHARE MODE; ALTER /*optionB*/ TABLE PRU DROP COLUMN A; ALTER /*optionB*/ TABLE PRU DROP COLUMN A1_CHANGED; ALTER /*optionB*/ TABLE PRU RENAME A1 TO A; DROP TRIGGER set_a1 ON PRU; DROP FUNCTION set_a1(); COMMIT WORK;
We could review stats from this command with following command:
SELECT * FROM pg_stat_statements WHERE query like '%optionB%';
The tradeoff here is that we can observe more resources used, and more space allocated to the table,. However, the application has been working most of the time without high contention locking.
In either case, you should be careful to first understand how stable the data in the column is so you don't miss new changes.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.