Salesforce Data Migration: The Complete Guide to Getting It Right

Data migration is the part of a Salesforce implementation that gets underestimated almost every time. It’s not glamorous, it’s not in the demo, and it’s easy to defer during the sales cycle with vague assurances that “we’ll figure out the data in the later phases.” Then the project goes live and the sales team discovers that their 7-year account history is missing, or worse — it’s there but it’s wrong.

Here’s the full picture: what a proper migration actually involves, which tools are right for which scenarios, what data quality actually means in practice, and what it costs.

Why Data Migration Is the Most Underestimated Cost

The reason migration consistently blows budgets isn’t complexity — it’s discovery. The data in your source system is almost never what people think it is. Legacy CRM systems, ERP databases, and spreadsheet-based processes accumulate years of inconsistency: duplicate records, missing required fields, broken relationships, field values that made sense in 2015 and don’t map to your new Salesforce data model.

You don’t know the full extent of this until you profile the data, and you can’t fully profile the data until you’ve extracted it, and by the time you’ve extracted and profiled it, the project timeline has already been set based on assumptions about how clean the data would be.

A mid-market implementation migrating 5 years of account, contact, opportunity, and activity data from a legacy CRM typically encounters: 15–30% duplicate account records, 20–40% of contact records with missing or malformed email addresses, relationship gaps (opportunities with no associated account because the original system allowed it), and field value inconsistencies that require manual cleansing decisions that can’t be automated. None of that was in the original scope estimate.

The 5 Phases of a Proper Migration

Phase 1: Extract

Extraction is pulling data out of the source system in a usable format. For most systems, this means querying the database directly or using an export API, producing flat files (CSV) or structured data that downstream tools can process.

Extraction sounds simple but has real risks. Source systems often have complex relational structures — your legacy CRM might have accounts in one table, contacts in three tables with different status codes, and custom object data in 15 auxiliary tables. Understanding the source schema well enough to extract complete, related records requires someone with technical access to the source system and enough time to document what they find.

If you’re migrating from a cloud system (HubSpot, Dynamics, Zoho, an older Salesforce org), API-based extraction is usually more reliable than database dumps. APIs return structured, documented data; database dumps return raw tables that require schema interpretation. Extract everything — even data you think you won’t need. Storage is cheap; re-extracting from a decommissioned system is sometimes impossible.

Phase 2: Profile

Profiling is the analytical step that most companies skip or rush, and it’s where migration cost and timeline accuracy actually comes from.

A proper data profile assesses: record counts by object and data source, completeness rates for every field (percentage of records that have a value), uniqueness analysis (how many apparent duplicates exist, based on name/email/phone/address matching), value distribution (what values appear in picklist and text fields, and whether they match your target Salesforce picklists), and relationship integrity (how many child records point to parents that no longer exist, how many relationships are broken).

The profile output is a data quality assessment report that drives the cleansing plan and, critically, the timeline estimate. A data set with 8% duplicates and 95% field completeness migrates cleanly in a few weeks. A data set with 28% duplicates and 40% missing required fields might take 3–4 months of cleansing effort.

Phase 3: Cleanse

Cleansing is the most labor-intensive phase and the one most dependent on business judgment. Technical tools can identify duplicates, flag missing fields, and flag format inconsistencies — but they can’t decide which of two duplicate accounts is the “real” one, or whether a contact with no email address is still worth migrating.

Deduplication requires matching logic — rules that define when two records are the same entity. Exact email match is easy. Fuzzy name matching (is “Pfizer Inc.” the same account as “Pfizer Incorporated”?) requires threshold decisions that can generate false positives. Most migration projects use a combination of automated matching (flag likely duplicates) and manual review (a data steward or business owner confirms merge decisions for ambiguous cases).

Normalization means standardizing values to match your target system’s data model. If your source system has a “Status” field with 23 possible values and your Salesforce picklist has 8, you need a mapping table that defines how each source value translates. This mapping requires business decisions, not just technical ones.

Relationship repair means resolving broken references before migrating. If 3,000 opportunity records point to accounts that have been merged or deleted in the source system, you need to decide what to do with them — re-link to the surviving account, migrate as orphaned records, or exclude them.

Phase 4: Transform

Transformation converts cleansed source data into the structure required by the target Salesforce org. This includes field mapping (source field X → Salesforce field Y), data type conversion (source date format → Salesforce date format), object relationship mapping (source customer ID → Salesforce Account ID after Account migration), and any calculated or derived fields that need to be generated during migration rather than migrated directly.

The transformation layer is where ETL tools (extract, transform, load) earn their keep. A well-configured transformation handles the mechanical mapping work at scale, applying consistent rules to millions of records faster and more accurately than manual processing.

Transformation output should be a set of migration-ready flat files or direct API payloads — one per Salesforce object, in the correct import format — with all source IDs preserved as external ID fields so post-migration validation can reconcile source-to-target.

Phase 5: Load

Loading is inserting the transformed data into Salesforce. The mechanics are straightforward; the sequencing and error handling are where care is required.

Load order matters because of record relationships. You must load parent records before child records — Accounts before Contacts, Contacts before Opportunities, Opportunities before Opportunity Line Items. If you load Contacts before Accounts, the Contacts have nowhere to link. In a complex data model with 15 objects and multiple relationship chains, the load sequence can get complicated.

Load errors happen. A batch of 10,000 Contacts loads with 340 errors because 340 of them reference Account IDs that didn’t exist in the Account load (because those accounts had data quality issues that slipped through cleansing). Error logs need to be reviewed, the root cause identified, and the affected records reprocessed. This is normal; plan for it.

Large data volumes require attention to Salesforce API limits and bulk load strategy. The Salesforce Bulk API 2.0 supports loading up to 150 million records per 24-hour period and handles sets of up to 100 million records per job — more than adequate for most implementations but worth designing for if you’re migrating 10M+ records.

Tools: Matching the Right Tool to the Job

Salesforce Data Loader

Salesforce’s own free desktop tool. Handles CSV imports for any standard or custom object, supports both the standard API and Bulk API, and is the default choice for smaller migrations (up to a few hundred thousand records). Data Loader is adequate for simple object loads where your data is already clean and transformed. It doesn’t handle transformation, deduplication, or complex multi-object relationship mapping — you need to do all of that before you hand files to Data Loader.

Salesforce CLI (sf Data Commands)

The command-line alternative to the Data Loader UI. Scriptable, repeatable, and easier to automate in a pipeline than the desktop tool. The sf data import tree and sf data export tree commands handle related object sets (e.g., Accounts with their associated Contacts and Opportunities in one structured export/import). Useful for sandbox seeding and smaller-scale migrations where you want automation.

Jitterbit Data Loader (Free)

Jitterbit offers a free data migration tool that improves on Salesforce Data Loader with a more capable UI, better scheduling, and support for more complex mapping logic. Not a full ETL platform, but a meaningful upgrade for migrations where you need more than basic CSV import.

Informatica Cloud and PowerCenter

The enterprise standard for large-scale ETL in complex environments. Informatica has pre-built Salesforce connectors, handles transformation rules at scale, supports complex data lineage documentation (important for compliance environments), and has robust scheduling and monitoring. Informatica is the right choice for migrations involving millions of records, multiple source systems, or environments where data lineage documentation is required (life sciences, financial services). Cost: cloud edition starts around $2,000–$4,000/month for the connector and platform; on-premise PowerCenter licensing varies.

MuleSoft

Salesforce’s own integration platform. MuleSoft is more commonly used for real-time integration than for batch data migration, but it’s capable of both. If your organization already has MuleSoft licensed for integration purposes, using it for migration too is operationally efficient. The cost is the barrier for organizations that don’t already have it — MuleSoft licensing runs $50K–$200K/year depending on volume and deployment model.

Other ETL Options

Talend (open source option with good Salesforce connectors), Boomi (solid mid-market ETL with a clean Salesforce integration), and Fivetran (optimized for data warehouse loads rather than CRM migrations but sometimes applicable) round out the commonly used toolset. For small migrations with limited budget, Python scripts using the Simple Salesforce library are a legitimate option — inexpensive and flexible, but require developer resources.

What “Data Quality” Actually Means

Data quality is thrown around as a phrase without anyone specifying what it means in practice. Four dimensions matter for Salesforce migration:

Completeness: Every record has values in all required (and business-required) fields. Salesforce won’t load a Contact without a LastName; it will load one without an Email. “Business required” goes beyond system-enforced requirements — an Account without an industry classification may load fine but will break your segmentation reports from day one.

Deduplication: Each entity in the real world (each company, each contact, each opportunity) is represented by exactly one record. Duplicate records are the most corrosive long-term quality issue in any CRM — they fragment history, distort pipeline metrics, and create confusion for every user who touches the affected accounts. Plan for deduplication before migration, not as a post-migration cleanup.

Normalization: Consistent representation of the same values. “NY,” “N.Y.,” “New York,” and “new york” in a State field all mean the same thing but will break geo-based reporting, territory assignment, and any filter that looks for exact matches. Normalization maps all equivalent values to a single canonical form before migration.

Relationship integrity: Records that reference other records do so correctly. Every Contact’s AccountId references a valid Account. Every OpportunityLineItem’s OpportunityId references a valid Opportunity. Broken relationships are usually invisible until someone tries to navigate them in the UI or run a report — and then they produce confusing data rather than obvious errors.

The Sandbox-First Approach

Every production data migration should be preceded by at least two passes in a full copy sandbox or developer pro sandbox that mirrors the production org configuration.

The first sandbox pass is a test migration using a representative sample of data (10–20% of total volume, selected to include edge cases and known problem records). The goal is to validate your transformation logic, identify load errors, and confirm that the data looks correct in Salesforce after loading. This pass will find problems — that’s the point.

The second sandbox pass is a full-volume migration rehearsal. Load all data into the sandbox, validate completeness and accuracy, and measure the elapsed load time. The time measurement matters because production migrations typically need to be performed in a defined window (overnight, over a weekend) with minimal disruption to the business. If your full migration takes 36 hours in rehearsal, your cutover plan needs to accommodate that.

Only after two successful sandbox passes should you execute the production migration. This sequence prevents the worst-case outcome — discovering in production, during go-live weekend, that your migration has a systematic error affecting 30% of records.

Rollback Planning

Every migration plan needs a documented rollback procedure. In practice, true rollback — deleting everything that was loaded and restoring the source system to production status — is rarely executed, but having the plan clarifies decision thresholds and forces you to think through dependencies.

The practical components of a rollback plan: a record of the pre-migration Salesforce state (full export from sandbox for comparison), documented criteria for when rollback will be triggered (e.g., more than X% of records fail validation, or critical data category has more than Y errors), and a technical procedure for mass deleting the migrated records using the Salesforce Bulk API and Data Loader’s delete function.

For most mid-market migrations, the more practical recovery mechanism than full rollback is targeted remediation — identifying the affected records, correcting the source data, and reloading the affected subset. This is faster and less disruptive than a full rollback when the problem is isolated.

Post-Migration Validation

Migration is not complete at load. Validation is required to confirm that the data in Salesforce matches the source and that business operations are functional.

Validation approach: reconcile record counts by object (source count = Salesforce count minus any excluded records, documented), spot-check a statistical sample of records for accuracy (compare source record to Salesforce record field by field), run reports that were running in the old system and compare outputs, and have business users perform UAT against their daily workflows with real data.

Validation should be performed by business stakeholders, not just the technical migration team. The sales manager who knows that the “Smith Distributors” account should have $2.3M in closed-won opportunities from 2024 will catch a data issue that a technical validator running record count reconciliations won’t.

Data You Can’t Migrate: Archive vs. Access

Not all data belongs in your production Salesforce org. Old records that no one actively uses consume storage, clutter search results, and add noise to reports. Salesforce storage costs real money — standard org allocation is 10GB of data storage plus 20MB per user license, and overages run $5/month per 500MB.

For historical records beyond 3–5 years, consider: excluding from migration entirely if they have no business value, migrating to a Salesforce archive object with limited fields and a read-only interface (lower storage cost per record), loading into a data warehouse (Salesforce Data Cloud, Snowflake, or BigQuery) where they can be queried but aren’t consuming CRM storage, or maintaining in the legacy system with read-only access for a defined sunset period.

Define the retention and access policy before the migration project starts. Changing it mid-project adds scope; discovering you made the wrong decision post-go-live creates a secondary remediation project.

Cost Reality

Migration cost is driven by data volume, data quality, the number of source systems, and the number of objects in scope.

Simple migration (one source system, 3–5 objects, fewer than 500K records, reasonably clean data): $15K–$40K. This is a straightforward mapping and load exercise with basic deduplication.

Mid-complexity migration (one to two source systems, 6–10 objects, 500K–5M records, moderate quality issues, some custom transformation required): $40K–$80K.

Complex migration (multiple source systems, 10+ objects, 5M+ records, significant deduplication and cleansing effort, compliance documentation requirements, enterprise ETL tooling): $80K–$200K+.

These figures cover migration design, tool configuration, sandbox testing, production migration execution, and post-migration validation. They don’t cover license costs for ETL tools, any data governance work that predates the migration project, or the business time required for stakeholders to make data quality decisions and sign off on validation.

The organizations that spend $15K on migration for a project that needed $80K almost always spend $80K eventually — they just spend it in post-go-live cleanup, support calls, and manual data correction over the following 12 months.

Estarei is a boutique Salesforce consulting firm built by ex-Salesforce employees. We’ve run data migrations from Dynamics, HubSpot, Oracle, SAP, legacy Salesforce orgs, and Excel — and we’ve seen what happens when migration gets underestimated. If you’re planning a Salesforce implementation, book a free consultation and we’ll give you an honest assessment of your migration scope.