MergeDatasets

Merges datasets holding overlapping cases but different variables. The merge may be controlled by keys or grouping variables.

Properties

Name Type   Description
MergeFiles MergeFileDescription 2..n Description of files to be merged.
MergeByVariables VariableReferenceBase 0..1 A variable or list of variables that acts as the unique case identifier across datasets. If MergeByVariables is absent, MergeType must be “sequential” on all files.
FirstVariable string 0..1 The name of a variable set to 1 for the first row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows.
LastVariable string 0..1 The name of a variable set to 1 for the last row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows.

Properties Inherited from TransformBase

Name Type   Description
ProducesDataframe DataframeDescription 0..n Signify the dataframe which this transform produces.
ConsumesDataframe DataframeDescription 0..n Signify the dataframe which this transform acts upon.

Properties Inherited from CommandBase

Name Type   Description
Command string 1..1 The type of command
SourceInformation SourceInformation 0..n Information about the source of the command.
MessageText string 0..n Adds a message that can be displayed with the command.

Item Type Hierarchy

Relationships

stub

Merge_options

Properties and Options of MergeFileDescription

Property name Description
FileName The names of the files to be merged. “Active file” means the file current active dataset.
_  
MergeType Describes the type of merge performed.
  > Sequential: Match rows from each input > dataframe in sequential order. > > OneToOne: Create one row for each value of > the mergeByVariables. If a combination > of the mergeByVariables is repeated, > only one row is matched. Rows with > repeated combinations of the > MergeByVariables may or may not be > included in the output file depending on > the newRow property. > > OneToMany: Create a row in the output > dataframe by matching rows in this > dataframe to every row in other dataframes > with the same value of MergeByVariables. > Note that OneToMany implies that one of > the other input datarames is set to > ManyToOne. > > ManyToOne: Create a row in the output > dataframe by matching all rows in this > dataframe to the one row in the other > dataframe with the same value of > MergeByVariables. > > Cartesian: Create a new row in the output > dataframe for every possible combination > of rows having the same value of > MergeByVariables. This is equivalent to a > many to many merge. R and Python use a > model derived from SQL, which is based on > Cartesian joins. > > Unmatched: Create a new row for every row > that cannot be matched on the > MergeByVariables > > SASmatchMerge: SAS uses a merging approach > that combines matching keys and sequential > merges within groups.
MergeFlagVariable Creates a new variable indicating whether the row came from this file or a different input file.
RenameVariables Variables to be renamed
_  
Update Describes outcome when a variable exists in both this file and another file.
  > Master: This dataframe is the Master > dataframe. > > Ignore: If a column with the same name > exists in the Master dataframe, ignore the > values in this dataframe. > > FillNew: If a column with the same name > exists in the Master dataframe, use the > values from this dataframe only in new > rows created from this dataframe. > > UpdateMissing: If a column with the same > name exists in the Master dataframe, use > values from this dataframe when the value > in the Master dataframe is missing. Rows > not in the Master dataframe are filled > from this dataframe. > > Replace: If a column with the same name > exists in the Master dataframe, use values > from this dataframe.
NewRow When TRUE, generates a new row when not matched to other files
KeepVariables List of variables to keep
DropVariables List of variables to drop
KeepCasesCondition Logical condition for keeping rows.
DropCasesCondition Logical condition for dropping rows.
MergeByNames > An ordered list of variables used as keys > in this file to be matched to the > variables in the mergeByVariables property > of the MergeDatasets command. This > property is only used when the key > variables in this file have different > names than the variable names listed in > the MergeDatasets command.

SPSS_merge_examples

==================== EXAMPLE 1 ==================================== ``` MATCH FILES

/FILE=’merge_1.sav’
/file=’merge_2.sav’
.
{“command”: “MergeDatasets”,
“$type”: “MergeDatasets”,

“MergeFiles”: [ “mergeFileDescription”:

{“fileName”: “merge_1.sav”, “mergeType”: “Sequential”,

“newRow”: TRUE

},

“MergeFileDescription”:

{“fileName”: “merge_2.sav”, “mergeType”: “Sequential”

“newRow”: TRUE

}

```

==================== EXAMPLE 2 ==================================== ``` MATCH FILES

/FILE=’merge_1.sav’

/in=from_f1 /file=’merge_3.sav’ /in=from_f3 /RENAME= (VAR3=VARx) /KEEP= id VAR2 VARx /by id /first=firstvar /last=lastvar

.

```

```

{“command”: “MergeDatasets”,
“$type”: “MergeDatasets”,
“mergeByVariables”: [ {“$type”: “VariableSymbolExpression”,
“VariableName”:”id”} ],

“firstVariable”: “firstvar”, “lastVariable”: “lastvar”,

“mergeFiles”: [
“mergeFileDescription”:

{“fileName”: “merge_1.sav”, “mergeType”: “OneToOne”, “mergeFlagVariable”:”from_f1”, “renameVariable”:[“RenamePair”:

{“OldVariable”:”VAR3”,”NewVariable”:”VARx”} ],

“newRow”: TRUE },

“mergeFileDescription”:
{“fileName”: “merge_3.sav”, “mergeType”: “OneToOne”, “mergeFlagVariable”:”from_f3”, “newRow”: FALSE }

},

{“command”: “KeepVariables”,

“$type”: “KeepVariables”, “variables”: {“$type”: “VariableListExpression”,

“variables”:
[ {“$type”: “VariableSymbolExpression”,
“VariableName”:”id”}, {“$type”: “VariableSymbolExpression”, “VariableName”:”VAR2”}, {“$type”: “VariableSymbolExpression”, “VariableName”:”VARx”} ]

},

“messageText”: “NOTE: This KeepVariables command is after the MergeDatasets command, because it applies to the output dataframe.” }

```

Stata_merge_examples

``` NOTE: These Stata Merge options are not represented in SDTL:

noreport nolabel nonotes sorted

```

==================== EXAMPLE 1 ====================================

``` use “mergedat1.dta”, clear merge 1:1 _n using “mergedat4.dta” list _all

{“command”: “MergeDatasets”,

“mergeFiles”: [ “mergeFileDescription”:

{“fileName”: “Active file”, “mergeType”: “Sequential”, “newRow”: TRUE, “mergeFlagVariable”:”_merge”},
“mergeFileDescription”:
{“fileName”: “mergedat4.dta”, “mergeType”: “Sequential”, “newRow”: TRUE} ]

},

{“$type”: “SetValueLabels”,

“command”: “SetValueLabels”, “variables”: [

{“$type”: “VariableSymbolExpression”,
“variableName”, “_merge”}

],

“labels”: [
{“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]

}

` ====================  EXAMPLE 2   ==================================== ` use “mergedat1.dta”, clear merge 1:1 id using “mergedat3b.dta” , update gener(matchVar) list _all

{“command”: “MergeDatasets”,

“MergeFiles”: [ “MergeFileDescription”:

{“fileName”: “Active Dataframe”, “mergeType”: “1:1”, “update”: “Master”, “mergeFlagVariable”:”matchVar”, “newRow”: TRUE},
“MergeFileDescription”:
{“fileName”: “mergedat3c.dta”, “mergeType”: “1:1”, “update”: “UpdateMissing”, “newRow”:TRUE} ],
“MergeByVariables”: {“$type”: “VariableSymbolExpression”,
“VariableName”:”id”}

},

{“$type”: “SetValueLabels”,

“command”: “SetValueLabels”, “variables”: [

{“$type”: “VariableSymbolExpression”,
“variableName”, “matchVar”}

],

“labels”: [
{“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]

}

` ====================  EXAMPLE 3   ==================================== ` use “mergedat1.dta”, clear merge 1:1 id using “mergedat3b.dta” , update replace keepusing(lastname)

{“command”: “MergeDatasets”,
“$type”: “MergeDatasets”,
“mergeByVariables”:{“$type”: “VariableSymbolExpression”,
“variableName”:”id”},

“mergeFiles”: [ “mergeFileDescription”:

{“mileName”: “Active file”, “mergeType”: “1:1”, “update”: “UpdateMissing”, “mergeFlagVariable”:”matchVar”, “newRow”:”False”},
“mergeFileDescription”:

{“fileName”: “mergedat3b.dta”, “mergeType”: “1:1”, “update”: “Master”, “newRow”:”True”, “keepVariables”:{“$type”: “VariableSymbolExpression”,

“variableName”:”lastname”}

},

{“$type”: “SetValueLabels”,

“command”: “SetValueLabels”, “variables”: [

{“$type”: “VariableSymbolExpression”,
“variableName”, “_merge”}

],

“labels”: [
{“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]

}

```