SDTL Command Language [1]_
==========================

Overview
--------

SDTL has been defined as a set of JSON (JavaScript Object Notation)
schemas. Each schema is a “type” (“$type” in the schemas), which can
include one or more “properties”. Properties are described by name/value
pairs. The value of a property may be another SDTL type, and SDTL types
are often nested in this way. A value may also be an array, which is an
ordered list of values. However, all values are ultimately resolved into
numbers, text strings, or boolean constants (True/False).

Figure 2 gives an example illustrating how SDTL types are nested inside
other types. The source command is "RENAME VARIABLES (var10 = Age).“,
which changes the name of variable “var10” to “Age” in the SPSS
language. The corresponding SDTL command is **Rename**, which is given
in both the “$type” and “command” properties in Figure 1. The original
SPSS version is given in the **originalSourceText** of the
**sourceInformation** property. The SDTL **Rename** command has only one
property, **renames**, which can include one or more values of type
**RenamePair**. In this case, we have only one **RenamePair**, which has
two properties **oldVariable** and **newVariable**, each of which must
be a **VariableSymbolExpression** type. A **VariableSymbolExpression**
has only one property, **VariableName**, which must be a string. These
strings are the old and new names for the variable, “var10” and “Age”.

SDTL is much more verbose than the SPSS version of the same command, but
SDTL is much easier for a computer to manipulate. Every element in SDTL
has a “$type” explaining where it fits in the SDTL schema. [2]_ We know
that “var10” is the old variable name, because it is tagged with
**oldVariable**. In contrast, a computer program would need to include
code reproducing the syntax rules of the SPSS language to identify
“var10” as the variable to be renamed in the original SPSS command.

Documentation and downloads for the SDTL Command Language are available
at http://c2metadata.gitlab.io/sdtl-docs/. The COGS documentation system
provides detailed descriptions of all SDTL types, and some descriptions
are supplemented with examples and other documents.

Figure 2. Example SDTL: Rename command

.. code:: json

    {
        "$type": "Rename",
        "command": "rename",
        "sourceInformation": {
            "originalSourceText": "RENAME VARIABLES (var10 = Age)."
        },
        "renames": [
            {
                "$type": "RenamePair",
                "oldVariable": {
                    "variableName": "var10",
                    "$type": "VariableSymbolExpression"

                },
                "newVariable": {
                    "variableName": "Age",
                    "$type": "VariableSymbolExpression"
                }
            }
        ]
    }


SDTL Type Hierarchy
-------------------

Types in SDTL are organized into groups, called “bases”, according to
the hierarchy shown in Figure 3.

.. image:: sdtl-type-hierarchy.png

Figure 3. Type Hierarchy in SDTL

CommandBase
~~~~~~~~~~~

The SDTL types in CommandBase describe actions, like merging files or
modifying the contents of a variable. CommandBase has two
sub-categories:

-  TransformBase includes commands that change data or metadata, like
   **Compute**, **Rename**, **MergeDatasets**, and **SetDataType**

-  InformBase commands provide information, like **Comment** and
   **Message**

All commands inherit properties from CommandBase (**command**,
**sourceInformation**, and **messageText**).

TransformBase
~~~~~~~~~~~~~

The types belonging to TransformBase are commands that change data or
metadata. All commands in TransformBase also inherit properties from
CommandBase to which it belongs:

+-----------------------+---------------------------------------------+
| **producesDataframe** | Identifies the dataframes which this        |
|                       | transform produces.                         |
+-----------------------+---------------------------------------------+
| **consumesDataframe** | Identifies the dataframes which this        |
|                       | transform acts upon.                        |
+-----------------------+---------------------------------------------+

In Table 1 the commands in TransformBase are arranged into five
functional sub-groups.

1. Commands that create variables or change the values of a variable.
   Only four SDTL commands add or modify variables.

2. Commands that change the metadata associated with a variable or
   dataframe.
   Statistical packages manage several types of metadata: variable
   names, labels, data type, display properties.

3. Commands that read or write files.

4. Commands that change the structure of a dataframe.
   The structure of a dataframes is changed by adding or dropping
   columns or rows.

5. Commands that control the flow of operations in a script
   Command execution can be conditional (if.. then..else) or iterative.

InformBase
----------

Table 2 shows informational commands that do not describe changes to the
data. Although SDTL does not include commands that analyze data, these
commands can be transcribed verbatim in an SDTL script with the
**Analysis** command. **Unsupported** is used for commands that our
Parser cannot translate into SDTL. **Invalid** is used when the parser
recognizes the command in the source language but its syntax does not
conform to expectations. **NoTransformOp** is used for source language
commands that do not have a function in SDTL, like invoking a library of
functions in the source language.

ExpressionBase
--------------

The SDTL types in Table 3 (ExpressionBase) are used in expressions,
which may be numeric, text or logical. The most powerful of these types
is **FunctionCallExpression**, which is discussed below.
VariableReferenceBase (Table 4) is a subcategory of ExpressionBase used
to describe the variables in an expression.

ExpressionBase includes types to describe lists and ranges of variables,
numbers, and strings. Commands like **Recode** and **SetDataFormat** are
often applied to many variables at the same time. The
**VariableRangeExpression** identifies variables in contiguous columns
in the dataframe by specifying the first and last variables in the set.
A **VariableListExpression** consists of a list of individual variables
(**VariableSymbolExpression**) and ranges (**VariableRangeExpression**).

Commands and functions may also require a list of values, like the
**cut_list()** function which divides a numeric variable into categories
at specified breakpoints. A **ValueListExpression** may consist of
numbers or strings (**NumericConstantExpression**,
**StringConstantExpression**) or ranges of numbers or strings
(**NumberRangeExpression**, **StringRangeExpression**). In SDTL a list
of values is a single expression, which simplifies the description of
functions.

Types for Complex Properties
----------------------------

Table 5 includes types that were created to represent complex properties
of other commands. For example, **SourceInformation** is a property of
every command in CommandBase that shows the source of this command in
its original language. In addition to the text of the original command,
**SourceInformation** provides the location of the original command in
the command script. **SourceInformation** has the following properties.

-  **lineNumberStart**: The line number of the beginning of the
   transform code

-  **lineNumberEnd**: The line number of the end of the transform code

-  **sourceStartIndex**: The character index of the beginning of the
   transform code

-  **sourceStopIndex**: The character index of the end of the transform
   code

-  **originalSourceText**: The original source code of the data
   transform code

-  **processedSourceText**: The source code of the data transform code
   after processing macros or loops

Although **sourceInformation** is only one property of a command, the
**SourceInformation** type allows it to include six types of
information.

+----------------------------------+----------------------------------+
| **Table 1**                                                         |
|   **TransformBase: SDTL Types that Change a Dataframe**             |
+----------------------------------+----------------------------------+
| \A. Commands that create                                            |
| variables or change the                                             |
| values of a variable                                                |
+----------------------------------+----------------------------------+
| **Aggregate**                    | An aggregation summarizes data   |
|                                  | using aggregation functions      |
|                                  | applied to data that may be      |
|                                  | grouped by one or more           |
|                                  | variables. The resulting summary |
|                                  | data is added to each row of the |
|                                  | existing dataset. The SDTL       |
|                                  | Collapse command is used when    |
|                                  | the summary data is used to      |
|                                  | create a new dataframe with one  |
|                                  | row per group..                  |
+----------------------------------+----------------------------------+
| **Compute**                      | Assigns the value of an          |
|                                  | expression to a variable.        |
+----------------------------------+----------------------------------+
| **Recode**                       | Describes recoding values in one |
|                                  | or more variables according to a |
|                                  | specified mapping. The Recode    |
|                                  | command can either describe a    |
|                                  | recoding of one or more          |
|                                  | individual variables, or a range |
|                                  | of variables. When one or more   |
|                                  | individual variables are         |
|                                  | described, a new variable name   |
|                                  | can be specified. In this case,  |
|                                  | the original variable is left    |
|                                  | alone, and a new variable is     |
|                                  | created with the recoded values. |
+----------------------------------+----------------------------------+
| **SetMissingValues**             | Defines values that are treated  |
|                                  | as missing values for a list of  |
|                                  | variables.                       |
+----------------------------------+----------------------------------+
|                                  |                                  |
+----------------------------------+----------------------------------+
| \B. Commands that change the                                        |
| metadata associated with a                                          |
| variable or dataframe                                               |
+----------------------------------+----------------------------------+
| **Rename**                       | Rename changes the name of a     |
|                                  | variable or list of variables.   |
+----------------------------------+----------------------------------+
| **SetDatasetProperty**           | Changes a property of a          |
|                                  | dataframe.                       |
+----------------------------------+----------------------------------+
| **SetDataType**                  | Sets the data type of a variable |
|                                  | or list of variables.            |
+----------------------------------+----------------------------------+
| **SetDisplayFormat**             | Sets the display or output       |
|                                  | format for a variable or list of |
|                                  | variables.                       |
+----------------------------------+----------------------------------+
| **SetValueLabels**               | Describes the assignment of      |
|                                  | labels to categorical values.    |
+----------------------------------+----------------------------------+
| **SetVariableLabel**             | Describes the assignment of a    |
|                                  | label to a variable.             |
+----------------------------------+----------------------------------+
|                                  |                                  |
+----------------------------------+----------------------------------+
| \C. Commands that read or write                                     |
| files                                                               |
+----------------------------------+----------------------------------+
| **Load**                         | Load data from a file.           |
+----------------------------------+----------------------------------+
| **Save**                         | Writes a dataset to a file.      |
+----------------------------------+----------------------------------+
|                                  |                                  |
+----------------------------------+----------------------------------+
| \D. Commands that change the                                        |
| structure of a dataframe                                            |
+----------------------------------+----------------------------------+
| **AppendDatasets**               | Combines datasets by             |
|                                  | concatenation for datasets with  |
|                                  | the same or overlapping          |
|                                  | variables.                       |
+----------------------------------+----------------------------------+
| **Collapse**                     | A collapse command summarizes    |
|                                  | data using aggregation functions |
|                                  | applied to data that may be      |
|                                  | grouped by one or more           |
|                                  | variables. The resulting summary |
|                                  | data is represented in a new     |
|                                  | dataset. See Aggregate for       |
|                                  | adding summary variables without |
|                                  | changing the number of rows.     |
+----------------------------------+----------------------------------+
| **DropCases**                    | Rows that match the selection    |
|                                  | condition are deleted in the     |
|                                  | dataset. Other rows are          |
|                                  | retained.                        |
+----------------------------------+----------------------------------+
| **DropVariables**                | Deletes variables from the       |
|                                  | dataset.                         |
+----------------------------------+----------------------------------+
| **KeepCases**                    | Rows that match the selection    |
|                                  | condition are retained in the    |
|                                  | dataset. Other rows are deleted. |
+----------------------------------+----------------------------------+
| **KeepVariables**                | Variables to be retained in the  |
|                                  | dataset. Variables not on the    |
|                                  | list are deleted.                |
+----------------------------------+----------------------------------+
| **MergeDatasets**                | Merges datasets holding          |
|                                  | overlapping cases but different  |
|                                  | variables. The merge may be      |
|                                  | controlled by keys or grouping   |
|                                  | variables.                       |
+----------------------------------+----------------------------------+
| **NewDataframe**                 | Creates a new empty dataframe.   |
|                                  | Numbers of rows or columns may   |
|                                  | be specified. All values are     |
|                                  | assumed to be missing.           |
+----------------------------------+----------------------------------+
| **ReshapeLong**                  | Creates a new dataset with       |
|                                  | multiple rows per case by        |
|                                  | assigning a set of variables in  |
|                                  | the original dataset to a single |
|                                  | variable in the new dataset.     |
+----------------------------------+----------------------------------+
| **ReshapeWide**                  | ReshapeWide is not supported in  |
|                                  | the current version of SDTL,     |
|                                  | because it depends on values in  |
|                                  | the data. However, it may be     |
|                                  | useful when values of the index  |
|                                  | variable are available in the    |
|                                  | metadata file or the data can be |
|                                  | processed.                       |
+----------------------------------+----------------------------------+
| **SortCases**                    | Sorts rows in the dataframe in a |
|                                  | specified order.                 |
+----------------------------------+----------------------------------+
|                                  |                                  |
+----------------------------------+----------------------------------+
| \E. Commands that control the                                       |
| flow of operations in a script                                      |
+----------------------------------+----------------------------------+
| **DoIf**                         | A set of commands that are       |
|                                  | performed when a logical         |
|                                  | expression is true. May also     |
|                                  | include ElseCommands to be       |
|                                  | performed if the logical         |
|                                  | expression is false. The         |
|                                  | commands in DoIf are performed   |
|                                  | once, and it expects a logical   |
|                                  | condition that applies to the    |
|                                  | entire dataframe. Use IfRows for |
|                                  | commands that are performed on   |
|                                  | each row depending upon values   |
|                                  | on those rows.                   |
+----------------------------------+----------------------------------+
| **Execute**                      | This command causes the system   |
|                                  | to execute preceding commands    |
|                                  | before continuing to process the |
|                                  | command script.                  |
+----------------------------------+----------------------------------+
| **IfRows**                       | A set of commands that are       |
|                                  | performed on each row in the     |
|                                  | dataframe when a logical         |
|                                  | expression is true for that row. |
|                                  | May also include ElseCommands to |
|                                  | be performed if the logical      |
|                                  | expression is false. Use DoIf    |
|                                  | for a logical condition that     |
|                                  | applies to the entire dataframe  |
|                                  | and commands that are performed  |
|                                  | once.                            |
+----------------------------------+----------------------------------+
| **LoopOverList**                 | A loop creates multiple versions |
|                                  | of a set of commands by          |
|                                  | iterating over a list of         |
|                                  | variables, numbers, or strings.  |
+----------------------------------+----------------------------------+
| **LoopWhile**                    | LoopWhile iterates over a set of |
|                                  | commands under the control of    |
|                                  | one or more logical expressions. |
|                                  | Since the logical conditions     |
|                                  | typically depend upon values in  |
|                                  | the data, commands executed in a |
|                                  | LoopWhile cannot be anticipated  |
|                                  | and expanded in SDTL.            |
+----------------------------------+----------------------------------+

+----------------------------------+----------------------------------+
| **Table 2.**                                                        |
|   **InformBase: Commands that provide information**                 |
+----------------------------------+----------------------------------+
| **Analysis**                     | Describes an analysis command.   |
|                                  | An analysis command does not     |
|                                  | result in any data               |
|                                  | transformation.                  |
+----------------------------------+----------------------------------+
| **Comment**                      | Describes a source code comment. |
+----------------------------------+----------------------------------+
| **Invalid**                      | Describes an invalid command. A  |
|                                  | command is invalid if it uses    |
|                                  | incorrect syntax, or is          |
|                                  | otherwise not allowed by the     |
|                                  | executing system.                |
+----------------------------------+----------------------------------+
| **Message**                      | Inserts message text in the SDTL |
|                                  | file.                            |
+----------------------------------+----------------------------------+
| **NoTransformOp**                | NoTransformOp is used for a      |
|                                  | command in the original script   |
|                                  | that provides important          |
|                                  | information but does not have a  |
|                                  | function in SDTL. For example,   |
|                                  | “library()” in R loads a package |
|                                  | of R functions. Since the Parser |
|                                  | detects the library, the SDTL    |
|                                  | will reflect the library that is |
|                                  | used, and commands derived from  |
|                                  | the library will be translated   |
|                                  | in the SDTL script. However, it  |
|                                  | is useful to know which library  |
|                                  | is active for auditing the R     |
|                                  | script, even if it does not      |
|                                  | perform any data                 |
|                                  | transformations.                 |
+----------------------------------+----------------------------------+
| **Unsupported**                  | Describes an unsupported         |
|                                  | command. An unsupported command  |
|                                  | is valid syntax, but not         |
|                                  | supported by the parsing         |
|                                  | application.                     |
+----------------------------------+----------------------------------+

+--------------------------------------+----------------------------------+
| **Table 3.**                                                            |
|   **ExpressionBase: SDTL Types Used in Expressions**                    |
+--------------------------------------+----------------------------------+
| **BooleanConstantExpression**        | BooleanConstantExpression takes  |
|                                      | values of TRUE and FALSE.        |
+--------------------------------------+----------------------------------+
| **FunctionCallExpression**           | An expression evaluated by       |
|                                      | reference to the Function        |
|                                      | Library.                         |
+--------------------------------------+----------------------------------+
| **GroupedExpression**                | A group of expressions to be     |
|                                      | evaluated before expressions     |
|                                      | outside of the group. Used to    |
|                                      | control the order of operations  |
|                                      | in a formula.                    |
+--------------------------------------+----------------------------------+
| **IteratorSymbolExpression**         | The name of an iterator symbol   |
|                                      | used as an index in describing   |
|                                      | the actions of a loop.           |
+--------------------------------------+----------------------------------+
| **MissingValueConstantExpression**   | A missing value constant. Some   |
|                                      | languages allow multiple missing |
|                                      | value constants.                 |
+--------------------------------------+----------------------------------+
| **NumberRangeExpression**            | Defines a range of numeric       |
|                                      | values.                          |
+--------------------------------------+----------------------------------+
| **NumericConstantExpression**        | A numeric constant.              |
+--------------------------------------+----------------------------------+
| **NumericMaximumValueExpression**    | Represents the largest numeric   |
|                                      | value supported by a system.     |
+--------------------------------------+----------------------------------+
| **NumericMinimumValueExpression**    | Represents the smallest numeric  |
|                                      | value supported by a system.     |
+--------------------------------------+----------------------------------+
| **StringConstantExpression**         | A text string.                   |
+--------------------------------------+----------------------------------+
| **StringRangeExpression**            | Defines a range of string        |
|                                      | values.                          |
+--------------------------------------+----------------------------------+
| **UnhandledValuesExpression**        | Represents any values not        |
|                                      | previously handled (for example, |
|                                      | in a set of recode rules).       |
+--------------------------------------+----------------------------------+
| **ValueListExpression**              | Wraps a list of other            |
|                                      | expressions.                     |
+--------------------------------------+----------------------------------+
| **VariableReferenceBase**            | SDTL types used to describe      |
|                                      | variables. See Table 3.          |
+--------------------------------------+----------------------------------+

+-------------------------------------+----------------------------------+
| **Table 4.**                                                           |
|   **VariableReferenceBase: SDTL Types Used to Describe Variables**     |
|   **in Expressions**                                                   |
+-------------------------------------+----------------------------------+
| **AllNumericVariablesExpression**   | An expression that represents    |
|                                     | all numeric variables in the     |
|                                     | dataset, similar to \`_all\` in  |
|                                     | SPSS or Stata.                   |
+-------------------------------------+----------------------------------+
| **AllTextVariablesExpression**      | An expression that represents    |
|                                     | all text variables in the        |
|                                     | dataset, similar to \`_all\` in  |
|                                     | SPSS or Stata.                   |
+-------------------------------------+----------------------------------+
| **AllVariablesExpression**          | An expression that represents    |
|                                     | all variables in the dataset,    |
|                                     | similar to \_all in SPSS or      |
|                                     | Stata.                           |
+-------------------------------------+----------------------------------+
| **CompositeVariableNameExpression** | A composite variable name is     |
|                                     | used to describe a variable name |
|                                     | that is computed.                |
+-------------------------------------+----------------------------------+
| **VariableListExpression**          | A list of variables which may    |
|                                     | include variable names           |
|                                     | (VariableSymbolExpression) and   |
|                                     | variable ranges                  |
|                                     | (VariableRangeExpression).       |
+-------------------------------------+----------------------------------+
| **VariableRangeExpression**         | A list of variables in adjacent  |
|                                     | columns defined by the variable  |
|                                     | names of first and last columns. |
+-------------------------------------+----------------------------------+
| **VariableSymbolExpression**        | A reference to a variable.       |
+-------------------------------------+----------------------------------+

+----------------------------------+----------------------------------+
| **Table 5.**                                                        |
|   **Types for Complex Properties in SDTL Commands**                 |
+----------------------------------+----------------------------------+
| **AppendFileDescription**        | Describes files used in an       |
|                                  | AppendDatasets command.          |
+----------------------------------+----------------------------------+
| **DataframeDescription**         | Describes a dataframe in the     |
|                                  | consumesDataframe or             |
|                                  | producesDataframe types.         |
|                                  | Provides the name of the data    |
|                                  | frame and a list of variables    |
|                                  | (columns). DataframeDescription  |
|                                  | can also define dimensions in    |
|                                  | dataframes that have             |
|                                  | hierarchical indexes, data       |
|                                  | cubes, or multi-indexes.         |
+----------------------------------+----------------------------------+
| **FunctionArgument**             | Describes the arguments in a     |
|                                  | function as specified in the     |
|                                  | SDTL Function Library.           |
+----------------------------------+----------------------------------+
| **IteratorDescription**          | Describes an iteration process   |
|                                  | consisting of an                 |
|                                  | IteratorSymbolExpression and a   |
|                                  | list of values it takes.         |
+----------------------------------+----------------------------------+
| **MergeFileDescription**         | Describes files used in a        |
|                                  | MergeDatasets command.           |
+----------------------------------+----------------------------------+
| **RecodeRule**                   | Describes how values will be     |
|                                  | recoded.                         |
+----------------------------------+----------------------------------+
| **RecodeVariable**               | Describes a variable that will   |
|                                  | have its values recoded.         |
+----------------------------------+----------------------------------+
| **RenamePair**                   | Variable names before and after  |
|                                  | a variable is renamed.           |
+----------------------------------+----------------------------------+
| **ReshapeItemDescription**       | Describes a new variable created |
|                                  | by reshaping a dataset from wide |
|                                  | to long.                         |
+----------------------------------+----------------------------------+
| **SortCriterion**                | Describes a criterion by which   |
|                                  | cases are sorted, including the  |
|                                  | variable name and whether to     |
|                                  | sort ascending or descending.    |
+----------------------------------+----------------------------------+
| **SourceInformation**            | SourceInformation defines        |
|                                  | information about the original   |
|                                  | source of a data transform.      |
+----------------------------------+----------------------------------+
| **ValueLabel**                   | Associates a label with a value  |
|                                  | in a categorical variable.       |
+----------------------------------+----------------------------------+
| **Weight**                       | Describes weights to apply       |
|                                  | to computations and statistics.  |
+----------------------------------+----------------------------------+

A command in a source language may result in several SDTL commands. For
example, in some languages the cut() function, which divides a numeric
variable into categories, can also apply labels to the resulting
categories. The equivalent SDTL function does not include a way to
define value labels, and it would be followed by a separate
**SetValueLabels** command.

Files and Dataframes 
--------------------

SDTL was designed to describe statistical data arranged in a
two-dimensional matrix of rows (cases) and columns (variables). We refer
to data stored on persistent media as ‘files’, and we use ‘dataframe’ to
refer to temporary storage that is available for the duration of a
command script. Files are loaded into temporary storage where they
become dataframes, and dataframes are saved to external storage as
files. A command script may use many files and dataframes.

We use the term “active dataframe” to refer to the dataframe currently
in use by the script. Some statistical packages only use one dataframe
at a time. Other packages set up a working environment in which many
dataframes are accessible. The active dataframe is not always named in
the source script, but a name may be assigned in the SDTL script for
clarification.

**AppendDatasets** and **MergeDatasets** combine rows or columns from
two or more files or dataframes into a new dataframe. **AppendDatasets**
(Figure 4) combines rows from dataframes that share common variables.
**MergeDatasets** (Figure 5) combines columns from dataframes that share
rows referring to the same subjects or entities. 

.. image:: append-datasets.png

Figure 4. **AppendDatasets**

.. image:: merge-datasets.png

Figure 5. **MergeDatasets**

**MergeDatasets** is a particularly complex operation with different 
options in every source language.  Older languages, like SPSS and SAS, 
approach file merging as a process of drawing rows from two files that 
have been pre-sorted.  R and Python use concepts like “right” and “left” 
joins, which are familiar to SQL programmers but opaque to most others.  
SDTL covers old and new source languages by identifying three properties 
defining a wide range of merge procedures.  First, SDTL has a controlled 
vocabulary for the **mergeType**.  A “Sequential” merge matches rows in the 
order that they appear in each file.  A “OneToOne” merge expects each file 
to have one or more identifying variables that uniquely link each row in 
one file to the corresponding row in the other file.  

Second, the **update** property determines the outcome when a variable 
appears in more than one input file.   Some languages use file merging as 
a way of updating data in an existing file.  For example, the merge may be 
designed to supply values for rows with missing values.  In this case, the 
SDTL **update** property of the primary file is set to “Master”, and the 
**update** property of the secondary file is set to “UpdateMissing.”
  
Third, the **newRow** property describes the outcome when a row in one file 
cannot be matched to any of the rows in the other file.  When **newRow** is 
TRUE, a row is added to the output file even if it is not matched.  When 
**newRow** is FALSE, unmatched rows are not included in the output file.  
For “inner” and “outer” joins we set the **newRow** of every file to TRUE or 
FALSE respectively.  “Left” and “right” joins involve one file set to TRUE 
and another file set to FALSE.

The “SDTL Merge Gallery” includes more than thirty examples of merge 
commands in source languages with corresponding SDTL.   

See SDTL Merge Gallery: https://gitlab.com/c2metadata/sdtl-cogs/-/blob/master/CompositeTypes/MergeDatasets/SDTL_Merge_Gallery.pdf

**Aggregate** and **Collapse** involve computations across multiple
rows. For example, consider a dataframe consisting of a row for each
child that includes the ID of the mother of each child. If we count rows
by mother’s ID, we get the number of children for each mother. The
**Aggregate** command (Figure 6) retains the existing number of rows and
adds the number of children variable to every child’s row. The
**Collapse** command (Figure 7) results in a new dataframe with one row
per mother.

.. image:: aggregate.png

Figure 6. **Aggregate**

.. image:: collapse.png

Figure 7. **Collapse**

Conditional Execution by Row or by File/Dataframe
-------------------------------------------------

The execution of a command may depend on a logical condition that is
evaluated sequentially for each row in a dataframe or on a logical
condition that applies to the entire dataframe. For example, a
**Compute** command may be applied only to rows where “varX” is less
than 1000 or it may be applied to all rows when the average of “varX” is
less than 1000. In SDTL **IfRows** is used when the condition is
evaluated and executed separately on each row, and **DoIf** is used when
the condition is evaluated once for the entire dataframe. Both
**IfRows** and **DoIf** can be applied to a group of commands, and both
include an **elseCommands** property for commands to be performed if the
condition is false.

Links to related documents
--------------------------

:doc:`Hierarchical Indexing in SDTL -- data cubes and multi-indexes </command-language/Hierarchical_Indexing_in_SDTL>`


Notes
-----
.. [1]
   In this section SDTL the names of types and properties are given in
   **boldface**. SDTL types begin with uppercase, and properties begin
   with lowercase. Values are surrounded by double quotes (“”).

.. [2]
   As explained in SDTL Best Practices and Conventions #1, the “$type”
   property may be omitted when only one type is allowed.