Categorical variables in data set - how do i clean?

%3CLINGO-SUB%20id%3D%22lingo-sub-2563453%22%20slang%3D%22en-US%22%3ECategorical%20variables%20in%20data%20set%20-%20how%20do%20i%20clean%3F%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2563453%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20all%2C%3C%2FP%3E%3CP%3E%3CSPAN%3EI%20have%20to%20clean%20my%20dataset%20for%20missing%20values.%20The%20feature%20%22Last_New_Job%22%20states%20the%20amount%20of%20years%20since%20last%20job.%20%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3EThere%20is%20some%201.000%20missing%20values%20which%20i%20would%20like%20to%20clean%20by%20replace%20with%20the%20median%20value.%20%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3EThe%20problem%20is%20that%20the%20dataset%20has%20%22%26lt%3B4%22%2C%20which%20makes%20the%20dataset%20a%20%22string%22%20dataset.%20What%20solution%20can%20i%20do%20to%20make%20the%20%22%26gt%3B4%22%20a%20numerical%20number%3F%20Or%20should%20i%20just%20replace%20the%20missing%20value%20with%20mode%3F%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22image.png%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F296801iABE850AE9D9E41ED%2Fimage-size%2Flarge%3Fv%3Dv2%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22image.png%22%20alt%3D%22image.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2563453%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Occasional Visitor

Hi all,

I have to clean my dataset for missing values. The feature "Last_New_Job" states the amount of years since last job.

There is some 1.000 missing values which i would like to clean by replace with the median value.

 

The problem is that the dataset has "<4", which makes the dataset a "string" dataset. What solution can i do to make the ">4" a numerical number? Or should i just replace the missing value with mode?

 

image.png

0 Replies