5 keys to Document Migration Without Screwing It Up image

If anyone tells you migrating legacy documents is easy, you have my permission to kick them in the shin!  
Migrating legacy documents is complex, and there are many factors that businesses need to consider when deciding whether now is a great time to organize and clean houses.  
I don’t want to scare you; I don’t. However, so many factors are at stake, and no one wants to leave valuable data in the old system when the new one is empty and shiny! Around 50% of the deployment effort is consumed by document migration and its activities.  
How can we ensure that we successfully transition legacy into a brand-spanking new system and preserve all its history? Here are some tips for you.  


There is no one-button approach to migration. It is a complex, time-consuming endeavour. It deserves its own project plan, strategy, budget, and team. An entity-level scope and plan are a must-have right at the beginning, so there are no sudden exclamations of “Oh, we forgot to load THESE reports. Who will do that?” right before the deadline.  
You can also do it in one go (not recommended) or in small batches every week. This is not the easy decision you think it is, however. Everyone must agree, and there must be clear communication to all business and technical stakeholders about when and what data will be in the new system. This also applies to any system outages as well.   

You must also be in a position to answer the following questions: 

  • Whose content is being migrated? HR, IT, Finance, Enterprise-wide? 
  • What’s the business driver for migration? New cloud-based system to store files or staff retirements or to retain tacit knowledge, for example. 
  • What contents are eligible to be migrated? 
  • Once the data is migrated to the new system, will the legacy repository still be used or made read-only? 
  • When will the legacy repository be decommissioned? If so, when (e.g. one year) 
  • Do we have access to the source repository, a staging area (for cataloguing, purging, and sorting) and a target system to store the files? 

This brings us to…  


Document migration is complex and consumes an inordinate amount of time. A lot of these time-consuming actions are invisible in the beginning.  
Every project stage requires careful consideration, including understanding the field, mapping the source field to the target field, configuring or building transformations, performing tests, measuring data quality for the field, and how many ‘tags’ or categories we will apply to the documents.  
Some tools such as Sharegate, Jitterbit, Midas, or Starfish ETL can be used. These will help reduce time in the build phase especially.  
But understanding the source data, which is the most crucial task in any migration project, cannot be done by automated tools. This requires analysis to take time going through the documents to determine what effort will be necessary to classify, purge and enrich with new metadata. 
If you want a very simplistic estimate, one 8-hour day for every collection of like content transferred from the legacy system to the new one (using excel or a datasheet that you can quickly copy and paste like-property values).  
There are, of course, exceptions, like data replication between the same source and target schemas without further transformation, also known as a 1:1 migration, where we can base the estimate on the number of tables to copy.  
Creating a detailed estimate is very much an art.  

Some questions to ask will be: 

  • Do we have the migration tools available? 
  • What formats are in scope? For example, executables and databases in or out? Emails? Microsoft Office Products? Older out-of-date products? 
  • Will the documents be related or connected to system data, business intelligence or process automation? 
  • How many years of volume will be included, and what’s the total volume and count of files? 
  • Is there any cleaning (categorizing) anticipated? 


Optimism is not your friend when it comes to data quality. Even if you aren’t getting any issues from the legacy system, there will be issues, and there will be many.  
All new systems have new rules that might even violate legacy rules. For instance, email correspondence may be required on the new but was not in legacy.  
Watch out for the occasional bump that comes with documents no one has touched in years. There may be legacy documents still using WordPerfect that now need to be converted to Microsoft Word or PDF. Check for media/format obsolescence and media degradation (e.g. floppy disks).  
A good rule is “the older it is, the bigger mess we are going to find.” It is vital that you decide early on just how much history you want to transfer to the new system based on its long-term legal and operational retention value.  

Here are a few questions that will need answers:

  • What keywords would enable quicker search and relatability? 
  •  Do we have to rename files longer than 255 characters? 
  • Are there any files into many subfolders deep that create too long of a string to transfer? 
  • What do we do with files with names like untitled.doc or doc1.doc, or even joe.doc? 
  •  Will the new repository accept special characters in file names (#,$,%,/)? 
  • What do we do with empty documents? 
  • Do we have to convert file formats? 
  • Are emails included? Skype, Slack, or Chat records? Zoom recordings? 
  • What do we do with redundant, outdated, trivial information (ROT)? 
  • After purging and cleansing, do we have the same count or a reduced count of files? 


Business people are the creators and consumers who genuinely understand the data and who, in the end, can recognize what has historical value to keep and what to dispose of. This is why it is essential to have someone from the business team involved in classifying documents and mapping.   
This is where running a test batch and then letting the business team go at it is in your best interest. You may hear, “Oh, I see now. Right, we are going to have to change that.” a lot.  
Business users add “Context to content” and provide a deep understanding of the documents, where they come from, what’s the subject, their location, and their relation to other daily office duties. (e.g. Employee file, field investigation, road repair, mortgage assessment, litigation, event campaign, etc.)  
If you don’t engage the subject matter experts, your new system will contain documents that won’t accurately relate to historical and current-day business activity. Furthermore, the ability to interconnect documents and system data for business intelligence and automation will be significantly hampered without this context.

Some questions to nail down with the business owners and authors:

  • To whom does the information belong?  
  • Have some of the others left the organization? If so, who will take responsibility for this information? 
  • Is there 3rd party information we are keeping as an internal record, or is the 3rd party expected to keep the official record? 
  • are users aware and have time allocated to participate in cleaning?  


As I mentioned initially, trying to do this in one big bang is not recommended. We know it is a crappy job and hope we can just be one and done, but the truth is you will probably be migrating “waves.” This means repeating actions multiple times.  
Typically, you have a dry run, which should be about 25% of the documents. Here you will be looking for accuracy and time taken to load. The second is a repeatable batch load of ‘like’ content until the entire scope of documents for upload is complete. The poorer the data quality, the more runs will have to take place.   

Questions to ask to optimize the migration process:

  • Have there been any changes made to the source documents during this transaction period, otherwise known as deltas?  
  • If data discrepancy errors occur during upload, will business users be available to investigate and update? 
  • Will the source repository be converted to ‘read only’ after migration?  
  • When will the source and staging copies be decommissioned or deleted?   

Moving legacy documents into a new system is a complex journey with many hidden potholes. Preparation is everything, and expect the unexpected. The key is clear communication on the process, a dedicated team, and patience.