pasobdesign.blogg.se - Pentaho data integration scaling

Similarly, for each new client one can set up the configuration in a new location and point the kitchen command to new kettle and jdbc properties files. If you run the second kitchen command, it searches kettle.properties at path “="/data/client2/” and jdbc.properties file at path “="/data/client2/jndi/”. If you run the first kitchen command, it searches kettle.properties at path “="/data/client1/” and jdbc.properties file at path “="/data/client1/jndi/”. In the above example, we have separated kettle.properties and jdbc_properies files for both clients at a different location. KETTLE_HOME="/data/client2/" KETTLE_JNDI_ROOT="/data/client2/jndi/" /opt/data-integration/kitchen.sh -rep test -job load_data_job KETTLE_HOME="/data/client1/" KETTLE_JNDI_ROOT="/data/client1/jndi/" /opt/data-integration/kitchen.sh -rep test -job load_data_job To overcome this problem, one can assign the path while firing the kitchen command. opt/data-integration/kitchen.sh -rep test -job load_data_job In case one has installed PDI on a server at path “/opt/data-integration” and ran a PDI job through below kitchen command then by default it searches kettle.properties and repository.xml at path “KETTLE_HOME/.kettle” and searches jdbc.properties at path “/opt/data-integration/simple-jndi”. We, therefore, wanted to achieve our objective with minimal rework and a good design approach which would be maintainable and operationally effective. Imagine a scenario where there are hundreds of ETL jobs and tens of clients! It would lead to a huge duplication of effort and would be operationally ineffective. In such situations, multiple parameters might override during parallel execution due to common kettle.properties file.Īn inefficient way to tackle this problem would be to create a separate job for each client.

Also, there is a good possibility of executing the same job in parallel for different source and target databases. Moreover, since Pentaho refers JNDI (Java Naming and Directory Interface) connections using the name, one cannot define two JNDI connections with the same name.

To load the data using the same ETL job, one can change the source and target database configurations in jdbc.properties file but this is not a scalable approach as the properties file needs to be modified every time a job has to be executed. Thus, to start with, we developed the ETL (Extract, Transform & Load) jobs for a single client which had to be scaled for this new requirement. However, there were multiple clients with different data metrics and source databases while the data model (dimensions and facts) was common. We had to design and build a data warehouse for multi-tenant architecture.