Known issues with TSV export files

%3CLINGO-SUB%20id%3D%22lingo-sub-2429072%22%20slang%3D%22en-US%22%3EKnown%20issues%20with%20TSV%20export%20files%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2429072%22%20slang%3D%22en-US%22%3E%3CP%3EI%20have%20noticed%20a%20few%20hopefully%20fixable%20problems%20that%20currently%20render%20the%20exported%20TSV%20files%20less%20useful%20than%20they%20should%20be%2C%20and%20certainly%20these%20are%20issues%20that%20other%20users%20should%20be%20aware%20of.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20files%20are%20row%20delimited%20using%20the%20LINEFEED%20character%26nbsp%3B%20(ASCII%20character%2010)%2C%20and%20within%20each%20row%20the%20columns%20are%20delimited%20by%20the%20horizonatal%20TAB%20(ASCII%20character%209).%20The%20problem%20is%20that%20these%20characters%20can%20be%20embedded%20within%20an%20individual%20field%20as%20well%20(for%20example%20Customer%20Name%20Joe%5BTAB%5DBloggs).%20When%20this%20occurs%2C%20because%20there%20are%20no%20text%20qualifiers%2C%20every%20field%20in%20the%20row%20beyond%20the%20embedded%20tab%20is%20pushed%20out%20by%20one.%20If%20a%20LINEFEED%20is%20embedded%20in%20a%20field%20(for%20example%20in%20Customer%20Address%3A%201%20Microsoft%20Way%5BLINEFEED%5DRedmond)%2C%20it%20creates%20a%20new%20line%20at%20%22Redmond%22.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20it%20possible%20to%20replace%20the%20affected%20embedded%20control%20characters%20with%20something%20more%20benign%2C%20like%20SPACE%20(ASCII%20character%2032)%20during%20the%20generation%20of%20the%20extract%2C%20or%20to%20use%20text%20qualifiers%20such%20as%20double%20quotes%20to%20surround%20affected%20fields%3F%20Or%20simply%20handle%20them%20at%20the%20point%20of%20entry%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20file%20export%20date%20ranges%20appear%20to%20be%20based%20on%20the%20booking%20datetime.%20If%20there%20was%20an%20option%20to%20base%20it%20on%20the%20last%20modified%20date%2C%20it%20would%20open%20up%20the%20possibility%20of%20incremental%20extracts%2C%20thus%20hugely%20reducing%20the%20burden%20on%20MS's%20servers%20and%20everyone%20else%20who%20can%20implement%20incremental%20handling%20of%20these%20files.%20To%20be%20100%25%20sure%20all%20changes%20have%20been%20captured%2C%20the%20user%20has%20no%20alternative%20other%20than%20to%20download%20every%20single%20booking%2C%20every%20single%20time.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIt's%20possible%2C%20(indeed%20common)%20for%20customers%20to%20create%20bookings%20that%20are%20almost%20duplicated%20(say%20all%20fields%20except%20two%20identical).%20When%20they%20appear%20in%20the%20same%20extract%2C%20there%20seems%20to%20be%20no%20way%20to%20know%20which%20record%20takes%20precedence.%20If%20we%20pick%20the%20wrong%20record%20in%20the%20belief%20that%20it%20is%20the%20most%20recent%2C%20this%20can%20cause%20embarrassment.%20%22What%20is%20wrong%20with%20your%20%26lt%3B%26amp%3B%26amp%3B%24%5E%25%23%40%26amp%3B%5E%26gt%3B%20booking%20system!%20I%20corrrected%20that!%22%20This%20could%20be%20solved%20with%20by%20including%20a%20modified%20date%20and%20time%20in%20each%20row%20or%20some%20sequence%20number%20based%20on%20the%20time%20of%20the%20change.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIncluding%20the%20selection%20criteria%20metadata%20in%20the%20filename%20would%20also%20be%20a%20big%20improvement%20eg%20MSBookings_mycal_20210601_20210606.tsv%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2429072%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EBookings%20TSV%20export%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Occasional Visitor

I have noticed a few hopefully fixable problems that currently render the exported TSV files less useful than they should be, and certainly these are issues that other users should be aware of.

 

The files are row delimited using the LINEFEED character  (ASCII character 10), and within each row the columns are delimited by the horizonatal TAB (ASCII character 9). The problem is that these characters can be embedded within an individual field as well (for example Customer Name Joe[TAB]Bloggs). When this occurs, because there are no text qualifiers, every field in the row beyond the embedded tab is pushed out by one. If a LINEFEED is embedded in a field (for example in Customer Address: 1 Microsoft Way[LINEFEED]Redmond), it creates a new line at "Redmond".

 

Is it possible to replace the affected embedded control characters with something more benign, like SPACE (ASCII character 32) during the generation of the extract, or to use text qualifiers such as double quotes to surround affected fields? Or simply handle them at the point of entry?

 

The file export date ranges appear to be based on the booking datetime. If there was an option to base it on the last modified date, it would open up the possibility of incremental extracts, thus hugely reducing the burden on MS's servers and everyone else who can implement incremental handling of these files. To be 100% sure all changes have been captured, the user has no alternative other than to download every single booking, every single time.

 

It's possible, (indeed common) for customers to create bookings that are almost duplicated (say all fields except two identical). When they appear in the same extract, there seems to be no way to know which record takes precedence. If we pick the wrong record in the belief that it is the most recent, this can cause embarrassment. "What is wrong with your <&&$^%#@&^> booking system! I corrrected that!" This could be solved with by including a modified date and time in each row or some sequence number based on the time of the change.

 

Including the selection criteria metadata in the filename would also be a big improvement eg MSBookings_mycal_20210601_20210606.tsv

1 Reply
Great feedback @g-regex!