Differences between revisions 19 and 20

Migration External Data into V2

V2Master

Contents

Introduction

Import Tools

V2Import Rails Modul

Module Classes

A new module V2Import implemented in lib/v2_import.rb. The module has the following main classes:
- Helper - methods used in other classes of the module
- Reader::DSLReader - class to read and check the import DSL configuration file
- Reader::CSVReader - read the input CSV file into a hash
- Import - loads data into the database based on the result of the two previous class

Module Usage

The module can be used from the Rails console as described in this chapter or the Rake wrapper can be used as described in chapter "V2Import Rake Task" below.
- Staring the Rails console:
```
$ rails c production  # The development system will be used if "production" omitted
> ActiveRecord::Base.logger.level = 1 # Switch off SQL logging to allow to see error messages easily
> Authorization.current_user = User.find(2) # Set the appropriate user to avoid authorization errors
```
The module should be initialized first passing the filenames (CSV file, import configuration DSL file) and CSV options:
- ```
import = V2Import::Import.new csv_file, config_file, csv_options
```
csv_options are option of the standard Ruby CSV.new method. They can be used to change the default separators (col_sep: ',', row_sep: 'auto', quote_char: '"'). The module initialization exits with exception on the first configuration error found. The initialzed module can be queried to show the internal data formats:
- ```
y import.config_hash # show the internal format of the import configuration, read from the DSL file
y import.csv_hash    # show the internal format of the input CSV file
```
The load method of the initialized module should be called to load the data into the database:
- ```
import.load(load_options)
```
load_options:
- insert_only: default = true, all records are new.
- rollback: default = false, rollback at the end, good for testing the input.
The load method doesn't exits on errors, but rollbacks every changes if data error occurred. The 2 calls can be chained of course:
- ```
V2Import::Import.new(csv_file, config_file, csv_options).load(load_options)
```

DSL File Format

The file format explained on two samples. Pre-defined DSL files can be found in script/v2_import/. The first format is a document/revision/notes/classification import:

# encoding: utf-8                                # needed only to allow syntax check with ruby -c
v2_import do                                     # outline block always v2_import
  date_format "%d.%m.%y"                         # optional input date format, defaults accepted: [YY]YY[-./]MM[-./]DD[-./]
  root_path "/var/tmp"                           # optional root path for Paperclip::Attachments
  model Document do                              # the main model block, can contain defaults, attributes,
                                                 # classifications blocks
    keys :code                                   # list of table fields of a possible composed unique key to
                                                 # find database records to be updated (use symbols)
    defaults do                                  # default values for the containing model
                                                 # the right side can contain evaluable rails code
      resource_sub_type_id ResourceSubType.get_default_by_resource(Document).id
      folder :foreign_key do                     # association can be only foreign_key to load default value
        attributes do                            # attributes block for the folder record
          code "WBS"                             # lookup attributes, the right side can contain evaluable rails
                                                 # code
        end
      end
    end

    attributes do                               # attributes block for the main record
      code "Nº CARTA"                           # the right side must contain a reference to a CSV column
      revisions :required do                    # association is also an attribute, it can contain defaults,
                                                # attributes, classifications blocks again
                                                # required option means - that the record is created anyway
        keys :code                              # unique key within the revision records associated to the document
        defaults do
          code "1"
          number 1
          user :foreign_key,  do
            attributes do
              initials "AH /SXS"
            end
          end
          resource_sub_type :foreign_key, :required do # this is a required association as foreign key
            attributes do
              code "Revision Type"
            end
          end
        attributes do
          date "FECHA DE EMISION"               # the right side must contain a reference to a CSV column
          title1 "DESCRIPCION"
          notes do                              # not required -> created only if at least 1 non-default
                                                # attribute filled (in this case the note attribute)
            keys :issue_datetime
            defaults do
              issuer_id 1
            end
            attributes do
              note "OBSERVACIONES"              # the right side must contain a reference to a CSV column
              issue_datetime "FECHA DE EMISION" # the same CSV column can be used more than once
            end
          end
        end
        classifications do                      # classification of the revision, see description below
          # "zero level" classification, the value in CSV column will go into one of the value fields of object_classifications depending on data_tape.
          REASON  "Reason for Issue", lookup: nil
        end
      end
    end

    # all classifications defined as
    # - code of the classification class (list)
    # - CSV column reference for the lookup value
    # - lookup option for the lookup field to find the classification
    #   - "lookup: nil" means a "zero level" classification
    # Note, that a classification class can appear more than 1 time to add more
    # classification in the same class.
    classifications do                    # classification of the document
      ORG "DESTINATARIO", lookup: :name # Destination
      LOC "ESTADO",       lookup: :name # Status
    end
  end
  before_save do                               # optional callback before saving the instance
    self.... = ....
  end
  after_save do                                # optional callback after saving the instance
    self.... = ....
    save!
  end
end

The second example to load references (object_links) in a second step if both sides of the references are already loaded:

v2_import do
  model ObjectLink do
    attributes do
      left :foreign_key, :required, model: Document do  # define model, because polymorphic assoc.
        attributes do
          code "Left Code"
        end
      end
      right :foreign_key, :required, model: Document do # define model, because polymorphic assoc.
        attributes do
          code "Right Code"
        end
      end
    end
  end
end

CSV File Format

The input file is a standard CSV file, with blank lines and comment lines (lines with starting '#') ignored.

Limitations

The current version (2014-08-14) does not support the following options:

multiple classifications in the same class
the to field in the case of 'zero-level' classification
classification data (from and to fields) on non 'zero-level' classification

Rake Task db:import

There is wrapper rake task to use the import module in lib/tasks/v2_import.rake. Run for help:

```
rake -D db:import
```

An example:

rake db:import csv_file='/tmp/REGISTRO DE CARTAS 2014.csv' config_file='config/sites/astaldi/site/import/import_cfg_registro_de_cartas_dsl.rb' rollback=true RAILS_ENV=production

Rake Task db:load_tree

There is a rake task to load tree type data into the database, as Classification, Folder and Organisation. This tool deals however only with the fields code, name, description and parent_id. So this tool is not a complete solution for classifications.
This tool understands two file formats: indented or non-indented. The parent-child relation is defined by TAB indentation in the indented format while the code of the parent is the first column in the case of the non-indented format.
Otherwise the input file is a standard header-less CSV file, with blank lines and lines starting with character "#" ignored. But use "#" with caution, "#" will be determined after CSV read, so the following error can occur if ' " ' appears somewhere in the line: CSV::MalformedCSVError: Illegal quoting in line nn.

There is wrapper rake task to use the import module in lib/tasks/v2_import.rake. Run for help:

```
rake -D db:load_tree
```

An example:

rake db:load_tree model=Folder file=/tmp/folder.csv insert_only=false rollback=true RAILS_ENV=production

Export Tools

Rake Task db:export

This is the pair of rake db:import. The same DSL can be used to export data from the database into CSV file. Run for help:

```
rake -D db:export
```

An example:

rake db:export csv_file=/tmp/a.csv config_file=script/v2_import/resource_sub_type_user_dsl.rb where='resource_type="User"'

Rake Task db:save_tree

This tool can be used to export tree type data, which can be imported by rake db:load_tree. There is wrapper rake task to use the export module in lib/tasks/v2_export.rake. Run for help:
- ```
rake -D db:save_tree
```
An example:
- ```
rake db:save_tree model=Folder file=/tmp/WBS.txt
```

Example Usage

Export Example

This sections shows how to export all data from a V2 system. This is done in two steps:

System Configuration
1. Project structures: Folders and Organisational Chart
2. Object types: Documents, Revisions, Users
3. Classification fields and schemes
4. Status lists and status values
5. File types
Project Data
1. Users
2. Local identities
3. User roles
4. User organisational assignments?
5. Work items
6. Work item links
7. Notes
8. Subscribers
9. Files

System Configuration Export

to be completed

Setup shell variables and destination directory

V2_ROOT=~/rails/v2p0              # Root directory of the V2 application

DST=${V2_ROOT}/tmp/csv            # Directory to save exported files
DSL=${V2_ROOT}/script/v2_import   # Directory where export/import DSL files are located

mkdir -p $DST

Export: Folder Structure, Organisational Chart, Classification Field Structure

rake db:save_tree model=Folder          file=$DST/folder.txt
rake db:save_tree model=Organisation    file=$DST/organisation.txt
rake db:save_tree model=Classification  file=$DST/classification.txt

Export: Object Types: Document, Revision, Transmittal, User

rake db:export config_file=$DSL/resource_sub_type_document_dsl.rb     csv_file=$DST/resource_sub_type_document.csv     where='resource_type="Document"'
rake db:export config_file=$DSL/resource_sub_type_revision_dsl.rb     csv_file=$DST/resource_sub_type_revision.csv     where='resource_type="Revision"'
rake db:export config_file=$DSL/resource_sub_type_transmittal_dsl.rb  csv_file=$DST/resource_sub_type_transmittal.csv  where='resource_type="Transmittal"'
rake db:export config_file=$DSL/resource_sub_type_user_dsl.rb         csv_file=tmp/resource_sub_type_user.csv          where='resource_type="User"'

Export: Classification Schemes:

rake db:export config_file=$DSL/classification_dsl.rb                csv_file=$DST/classification.csv
rake db:export config_file=$DSL/classification_scheme_dsl.rb         csv_file=$DST/classification_scheme.csv
rake db:export config_file=$DSL/classification_schemes_class_dsl.rb  csv_file=$DST/classification_schemes_class.csv

Export: Status Lists and Status Values

rake db:export config_file=$DSL/status_list_dsl.rb   csv_file=$DST/status_list.csv
rake db:export config_file=$DSL/status_value_dsl.rb  csv_file=$DST/status_value.csv

Export: File Types

rake db:export config_file=$DSL/file_type_ref_dsl.rb csv_file=$DST/file_type_ref.csv

System Data Export

to be completed

Import Example

Import Dependencies

Import data should be imported in the following order:

Folders & Organisation
1. folder - no dependencies
2. organisation - no dependencies
Workflow Status
1. status_list - no dependencies
2. status_value - depends on status_list
Classification Fields & Schemes
1. classification - no dependencies
2. classification_scheme - depends on classifications
Object Types
1. resource_sub_type_revision - depends on classification schemes
2. resource_sub_type_document - depends on classification schemes & revisions (for default revision type
3. resource_sub_type_user - depends on classification schemes
Classification Scheme Classes
1. classification_schemes_classes
Users
1. user - depends on classification schemes
2. user_role - depends on users

System Configuration Import

to be completed

Example Configuration Import Script

The following script shows the order in which configuration files should be loaded

#
# Export system configuration
#

V2_ROOT=~/rails/master/v2p0       # Root directory of the V2 application

SRC=${V2_ROOT}/tmp/import         # Directory where the import csv files are
DSL=${V2_ROOT}/script/v2_import   # Directory where export/import DSL files are located
#SEP="col_sep=' '"                # Tab

PREFIX='2015.03.06.'
SUFFIX='-ah5'


# -- Check that files are present

echo ""
echo "---- Check all file are present.."
echo ""

RC=0

check_file () {
  if [ ! -r "$1" ] ; then
    echo "ERROR: File not found: $1" 1>&2 ; RC=1
  fi
}

check_file "${SRC}/${PREFIX}folders${SUFFIX}.txt"

check_file "${SRC}/${PREFIX}organisation${SUFFIX}.txt"

check_file "${SRC}/${PREFIX}status_list${SUFFIX}.csv"
check_file "${SRC}/${PREFIX}status_value${SUFFIX}.csv"                 

check_file "${SRC}/${PREFIX}classification${SUFFIX}.csv"
check_file "${SRC}/${PREFIX}classification_scheme${SUFFIX}.csv"

check_file "${SRC}/${PREFIX}resource_sub_type_revision${SUFFIX}.csv"
check_file "${SRC}/${PREFIX}resource_sub_type_document${SUFFIX}.csv"
check_file "${SRC}/${PREFIX}resource_sub_type_user${SUFFIX}.csv"

check_file "${SRC}/${PREFIX}classification_schemes_class${SUFFIX}.csv"

check_file "${SRC}/${PREFIX}user${SUFFIX}.csv"
check_file "${SRC}/${PREFIX}user_role${SUFFIX}.csv"

## TODO: This information is already contained in the user.csv file
##
## check_file "${SRC}/${PREFIX}users_organisation${SUFFIX}.csv"
## check_file "${SRC}/${PREFIX}user_classification${SUFFIX}.csv"


if [ "$RC" != 0 ] ; then
  echo "Exiting due to previous errors"
  exit
fi
echo "OK"


# -- Org Chart

echo ""
echo "---- Import: Folders & Org Chart.."
echo ""

rake db:load_tree model=Folder       file=${SRC}/${PREFIX}folders${SUFFIX}.txt
rake db:load_tree model=Organisation file=${SRC}/${PREFIX}organisation${SUFFIX}.txt

# -- Status lists

echo ""
echo "---- Import: Status lists.."
echo ""

rake db:import \
  config_file=${DSL}/status_list_dsl.rb \
  csv_file=${SRC}/${PREFIX}status_list${SUFFIX}.csv   

rake db:import \
  config_file=${DSL}/status_value_dsl.rb \
  csv_file=${SRC}/${PREFIX}status_value${SUFFIX}.csv                 

# -- Classification

echo ""
echo "---- Import: Classification.."
echo ""

rake db:import \
  config_file=${DSL}/classification_dsl.rb \
  csv_file=${SRC}/${PREFIX}classification${SUFFIX}.csv

rake db:import \
  config_file=${DSL}/classification_scheme_dsl.rb \
  csv_file=${SRC}/${PREFIX}classification_scheme${SUFFIX}.csv

# -- Object types

echo ""
echo "---- Import: Object Types.."
echo ""

rake db:import \
  config_file=${DSL}/resource_sub_type_revision_dsl.rb \
  csv_file=${SRC}/${PREFIX}resource_sub_type_revision${SUFFIX}.csv

rake db:import \
  config_file=${DSL}/resource_sub_type_document_dsl.rb \
  csv_file=${SRC}/${PREFIX}resource_sub_type_document${SUFFIX}.csv

rake db:import \
  config_file=${DSL}/resource_sub_type_user_dsl.rb \
  csv_file=${SRC}/${PREFIX}resource_sub_type_user${SUFFIX}.csv

# -- Classification Schemes

echo ""
echo "---- Import: Classification Schemes.."
echo ""

rake db:import \
  config_file=${DSL}/classification_schemes_class_dsl.rb\
  csv_file=${SRC}/${PREFIX}classification_schemes_class${SUFFIX}.csv

# -- Users

echo ""
echo "---- Import: Users.."
echo ""

rake db:import \
  config_file=${DSL}/user_dsl.rb \
  csv_file=${SRC}/${PREFIX}user${SUFFIX}.csv

rake db:import insert_only=false insert_only=false \
  config_file=${DSL}/user_role_dsl.rb \
  csv_file=${SRC}/${PREFIX}user_role${SUFFIX}.csv

## TODO: This information is already contained in the user.csv file
##
## rake db:import \
##   config_file=${DSL}/users_organisation_dsl.rb \
##   csv_file=${SRC}/${PREFIX}users_organisation${SUFFIX}.csv
## 
## rake db:import \
##   config_file=${DSL}/user_classification_dsl.rb \
##   csv_file=${SRC}/${PREFIX}user_classification${SUFFIX}.csv

echo ""
echo "Done."

System Data Import

to be completed

Deletions are marked like this.	Additions are marked like this.
Line 340:	Line 340:
1. Classification Scheme Classes	1. '''Classification Scheme Classes'''