Excel .csv Conversion - Date-Formatting keeps returning even with prior formatting

Copper Contributor

Hi all,

 

I have an issue regarding the formatting of one column in a dataset. To shortly introduce it: We have a question on how many books participants have at home. They can select 'none', '1-10', '11-25', etc. Excel being Excel, it represents those two mentioned as dates. Before converting my .xlxs-file to .csv, I formatted this column to be non-date (I tried Text, Custom (with choice '0'), and some other options, all have the same issue). After this formatting, I exported the .xlxs as a .csv (not UTF-8 .csv) and it represents it just fine ('11-25' stays '11-25' rather than 'nov-25'). However, when working in this file (it's continually updated, normally), the trouble starts. Mind, it's already in a .csv format by now, and the formatting was done prior to the conversion to .csv. Once I save this harmless looking file, the date-format comes back like a zombie rising from the dead and there's seemingly nothing I can do to change it. All changes to the format afterwards result in a weird integer-string of 5 numbers (45962, in various forms depending on the format but with these 5 integers there).

I already had contact with 2 chat-services on the issue, and together with the internet, they suggested that I could create a new 'Custom'-format (for instance, by saying I expect my data to be '11-25'; see https://support.office.com/en-us/article/Format-a-date-the-way-you-want-8E10019E-D5D8-47A1-BA95-DB95...). However, this would be relevant if I only had 11-25 as potential date-conversion-problem, but 1-10 will be, in infinite American wisdom, also revert back to a backwards date-format (jan-10).

Any idea on how to fix this? At the moment, I can only ever convert the file once, and never again because I cannot make change without it switching back to a date-format.

 

Help is greatly appreciated, as I'm quite close to committing murder on my laptop if it keeps annoying me like this.

 

12 Replies

@ThomasBraasETH When you save something like "1-10" to a csv file it just becomes a string of characters. No information regarding the Excel format (as text) of that string is saved with the csv file. When you then open the file in Excel, you should see a text import wizard where you have to go through the steps presented to you. In step 3, you need to specify the data type as Text. If you skip all of it and just press Finish at the beginning, Excel tries to be smart and it will guess the data type for you. Hence, entries like 1-10 and 11-25 could be dates. If it encounters 13-18 Excel will conclude that this can not be a date and imports it as a text. So, it's important that you do the data typing yourself.

 

And by the way, the "weird 5 digit number" 45962 is actually the day number in the internal Excel calendar for November 11, 2025. The day counter start at 1 for January 1, 1900.

@Riny_van_EekelenI'm not exactly sure if that solves the problem I've been having, or I'm simply not getting your idea. I already have a data-file, an .xlxs file. I converted this successfully to a .csv, still in Excel. I get that Excel is transforming the data automatically because I have not specified each and every column. I did that because I don't care about the 50 other columns in my datafile but only the one column that actually causes this whole headache. As such, I successfully transformed the one column I was interested in and if I don't change anything, that formatting is still there in the .csv file. The moment I add any row in that .csv file, and save the file (again, as .csv, as it already was), it removes the formatting it had before and that one column suddenly appears as date again.

As such, I'm not importing text into excel, the file is an .xlxs file with a column that could be regarded as text, so to speak.

@ThomasBraasETH 

Could you perform a global edit and replace the soft hyphen '-' (hex 00AD) by an En Dash '' (hex 2013) or possibly a non-breaking hyphen (hex 2010).  These would be visually similar to the original but Excel would no longer try to convert the strings into date values.

@PeterBartholomew1In a global edit, do you mean to reformat the column? Or simply find and replace? Most 'global edit' searches lead me back to reformatting, it seems. I could replace all the different hyphens I guess, although I'm not sure how this would work when entering new data and then storing it again. At the moment, the moment I save the file after the conversion to .csv, it happily destroys the formatting already in place and I'm back at square one. After I did that, I can't get the actual input back, as Excel has already decided to create a date-column and as a result, any formatting automatically produces this string counted from 01-01-1900 I believe.

@ThomasBraasETH 

If there is any point at which you have a text string comprising

2 digits, soft hyphen, 2 digits

and are in an environment with an editor, that is the moment to convert the soft hyphen into another character.  If on the other hand, you already have a mix of 15-20, 15-Oct, 5-Oct then you are faced with the task of unscrambling the omelette.

 

The values you require will be held within the date so a formula

= IF( ISNUMBER(interval), DAY(interval) & UNICHAR(8208) & MONTH(interval), interval)

would be a start to getting a column that you can Copy / Paste values over the top.

@PeterBartholomew1Well, the weird thing is that I'm not stuck with the omelette left unscrambling when I initially save the file as a .csv. It only happens again after I make any change and save it again. As we get the data from a different program entirely, we get in the form of an existing .xlxs file, and I'm not sure how to then re-edit that hyphen into a different character as I'm not making the .xlxs file myself with any editor of sorts. Is there any direction you can point me towards to get that going? Or otherwise, how to best use the formula you described? I'm assuming this has to utilize the initial column and then transform into a new column (meaning I'd likely have to do this every time I get additional data, which I what I wanted to prevent).

@ThomasBraasETH 

What I had in mind was to select the offending column in Excel before it is corrupted and use Replace (Ctrl+H) to replace the soft hyphen by another character.

Avoiding the use of CSV files would appear the be good abut, assuming that is not possible, and the eggs are broken then type the formula into a helper column and use the resulting values from there.

 

@PeterBartholomew1 Just tried it and the result is that all hyphen are turned into question marks, likely because the different hyphen is not supported ('26?50' is the result). So it works well before the conversion to .csv but stops working after. Unfortunately, we need the files in .csv format to be in compliance with our database, so working in .xlxs is not an option.
How would I use that formula exactly? I would imagine this is a bug that needs fixing, considering Excel undoes an earlier action because it saves a file, but I can't wait for that to ever happen. If I can use that formula to create a new column that reverts the 5-digit integer string to what it originally should be, I'm okay with that.

@ThomasBraasETH 

Looks like you need to stick to the regular ASCII keyboard characters such as tilde "~", underscore "_" or vertical line "|".  The attraction of the non-breaking hyphen or n-dash was their visual similarity to the normal hyphen.

 

As I understand it, anything you do by formatting is lost once you convert to CSV.  I imagine the damage is done once you bring a CSV file back into Excel and it tries to 'make sense' of it.  Viewed in a text editor and the CSV file is probably still OK.

 

The formula idea only kicks in once the damage is done and you have a mix of text, US dates and 'rest of world' dates.  By inserting a formula column that returns the day and month for any date (a number) as text you get to see what was in the CSV file before Excel helpfully scrambled it.  Copy/paste values could be used to overwrite the misbehaving ranges with something more acceptable.

 

Not that I would consider this a well-ordered process.  I really dislike such manual processing.

 

One further thought.  Have you tried importing the data using Power Query.  That should provide the opportunity to edit and adjust the number formats before trying to load to an Excel table.

@PeterBartholomew1No never tried Power Query, but I also don't think it's possible. We have a scanning system in place which delivers all data already in excel-files. So there's only importing from one type of excel-file to a .csv. We don't have a text-file of raw data which is then exported to .xlxs.

@ThomasBraasETH 

Opening csv file Excel performs all operation as with excel file, that's how the parser works. For example, if you have comma delimited file with one text string

1,2,=a1+b1

an open it in Excel, result in a1, b1 and c1 will be

1 2 3

and working formula in C1.

 

To prevent default converting of texts to numbers and dates perhaps the easiest way is to add the space at the front of text which could be converted, or two single apostrophes, but it's more visible.

If such data

image.png

save as csv and open it in Excel, we will have

image.png

 i.e. only A2 is converted.

 

Otherwise Get Data from Text File using legacy connector and applying Text format to all columns on the third step of the wizard.

@ThomasBraasETH 

If, at any point in the process you have a valid file (.txt, .csv, .xlsx), Power Query will import it and, assuming the data is laid out as a database table, it can be loaded to a Table in Excel.  If you chose you could separate the upper and lower limits into two fields or replace the hyphen by underscore (say) before loading to the Excel table.  Fresh data could be appended from multiple files.

 

If you need more flexibility, @SergeiBaklan 's approach should offer that.