Data ingestion – make you think of the awkward moments right after Thanksgiving meal? It shouldn’t!
At Baromitr, we are in the middle of major feature development and we’ve had to think critically on this subject and we thought – why not share what we’ve learned?
When ingesting data, there are two critical considerations – data size and frequency of data updates. (If you wanted to take this into another dimension, you could take on effort to transform data)
When looking at these two dimensions, they have an impact on the possible data ingestion methods to use – web-based manual entry, csv/flat file upload or automated, direct-data integration.
In the above graph, this is our take on which ingestion method is use at the right time.
Web-based manual entry methods allow small data users, who typically are business-centric or subject matter experts, insert that data directly when time allows for it. The tool can be optimized of course – when data needs to be structured at the point of entry (which is ideal, of course) the tool can be built to ensured a finite list of input options. That’s the key to structure data. We believe this method is ideal when records per upload range from 1-100 with monthly updates or up to 500 records if updated no more than quarterly.
As data increases beyond this range, data size begins to impact a user’s effectiveness at manual entry – enter a csv upload. The assumption here is that the data is more easily put into this method. We do begin to lose the structuring benefits of the manual entry interface, so extra data transformation processes must be put in place for post-upload.
Moving past monthly update frequencies, the transformation work as well as the data input work for the user put pressure on the previous two methods and we have a need to move to automated, direct-data ingestion into software systems. This comes with increased costs, but allows for daily daily updates, if not even more frequent.