World Library  
Flag as Inappropriate
Email this Article

Data transformation

Article Id: WHEBN0004080917
Reproduction Date:

Title: Data transformation  
Author: World Heritage Encyclopedia
Language: English
Subject: Identity transform, Transformation language, Master data management, Data transformation, Source-to-source compiler
Collection: Articles with Example C++ Code, Metadata
Publisher: World Heritage Encyclopedia

Data transformation

This article is about data transformation in computer science (metadata). For statistical application, see data transformation (statistics).

In metadata and data warehouse, a data transformation converts a set of data values from the data format of a source data system into the data format of a destination data system.

Data transformation can be divided into two steps:

  1. data mapping maps data elements from the source data system to the destination data system and captures any transformation that must occur
  2. code generation that creates the actual transformation program

Data element to data element mapping is frequently complicated by complex transformations that require one-to-many and many-to-one transformation rules.

The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.

A master data recast is another form of data transformation where the entire database of data values is transformed or recast without extracting the data from the database. All data in a well designed database is directly or indirectly related to a limited set of master database tables by a network of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from the parent database table. Therefore, when the proper master database table is recast with a different unique index, the directly and indirectly related data are also recast or restated. The directly and indirectly related data may also still be viewed in the original form since the original unique index still exists with the master data. Also, the database recast must be done in such a way as to not impact the applications architecture software.

When the data mapping is indirect via a mediating data model, the process is also called data mediation.


  • Transformational languages 1
  • Transforming source code 2
    • Example 2.1
  • See also 3
  • References 4
  • External links 5

Transformational languages

There are numerous languages available for performing data transformation. Many transformation languages require a grammar to be provided. In many cases the grammar is structured using something closely resembling Backus–Naur Form (BNF). There are numerous languages available for such purposes varying in their accessibility (cost) and general usefulness. Examples of such languages include:

  • AWK - one of the oldest and popular textual data transformation language;
  • Perl - a high-level language with both procedural and object-oriented syntax capable of powerful operations on binary or text data.
  • Template languages - specialized for transform data into documents (see also template processor);
  • TXL - prototyping language-based descriptions, used for source code or data transformation.
  • XSLT - the standard XML data transformation language (suitable by XQuery in many applications);

Although transformational languages are typically best suited for transformation, something as simple as regular expressions can be used to achieve useful transformation. A text editor like emacs or Textpad supports the use of regular expressions with arguments. This would allow all instances of a particular pattern to be replaced with another pattern using parts of the original pattern. For example:

foo ("some string", 42, gCommon);
bar (someObj, anotherObj);

foo ("another string", 24, gCommon);
bar (myObj, myOtherObj);

could both be transformed into a more compact form like:

foobar("some string", 42, someObj, anotherObj);
foobar("another string", 24, myObj, myOtherObj);

In other words, all instances of a function invocation of foo with three arguments, followed by a function invocation with two invocations would be replaced with a single function invocation using some or all of the original set of arguments.

Another advantage to using regular expressions is that they will not fail the null transform test. That is, using your transformational language of choice, run a sample program through a transformation that doesn't perform any transformations. Many transformational languages will fail this test.

Transforming source code

Program synthesis, Automatic programming and other fields use the data transformation strategies to translate, adapt or even generate software source code. Inversely these source transformation tools can be used for data transform, typically for transform "document source code" as HTML or another XML dialect (see also Template processors).

For further information on (software) source transformation see[1](Chapter 2.4) or.[2]

Generally the different types of transformations fall into one of two categories,[3]

  • Translation: a transformation from a language X into another language Y.
  • Rephrasing: a rephrasing involves a transformation within the same language but merely stated a different way.


A difficult problem to address in C++ is "unstructured preprocessor directives". These are preprocessor directives which do not contain blocks of code with simple grammatical descriptions, like in this function definition:

   void MyFunc ()
         if (x>17)
         { printf("test");
        # ifdef FOO
         } else {
        # endif
         if (gWatch)
         mTest = 42;

A really general solution to handling this is very hard because such preprocessor directives can essentially edit the underlying language in arbitrary ways. However, because such directives are not, in practice, used in completely arbitrary ways, one can build practical tools for handling preprocessed languages. The DMS Software Reengineering Toolkit is capable of handling structured macros and preprocessor conditionals. Brabrand and Schwartzbach (2000)[4] offer another approach, substituting the C preprocessor by a metamorphic one.

See also

Concepts:     Languages and typical transforms:     Other:
  • File Formats, Transformation, and Migration (related wikiversity article)


  1. ^ T. Cassidy (2004) "Concurrency Analysis of Java RMI Using Source Transformation and Verisoft",
  2. ^ J. R. Cordy (2006) "The TXL source transformation language". DOI 10.1016/j.scico.2006.04.002
  3. ^ Eelco Visser (2001), "A Survey of Strategies in Program Transformation Systems". Electronic Notes in Theoretical Computer Science, 57:363-377.
  4. ^ Claus Brabrand and Michael I. Schwartzbach (2000) "Growing Languages with Metamorphic Syntax Macros". BRICS Report Series RS-00-24. BRICS, Denmark. ISSN 0909-0878.

External links

  • Extraction and Transformation at DMOZ
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from World Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.