Cancelado

Automate the setting up of Greenplum database on EC2

I am going to do some performance testing on how well Greenplum performs when it utilizes many EC2 servers to run a query. Therefore, I need you to automate the process of starting many EC2 instances, installing Greenplum on each one, and setting up Greenplum so queries can be run in parallel. The setting up of Greenplum is described in detail in their admin guide. The admin guide is found by registering and logging into: <[url removed, login to view]>

I want the scripting to be written in python, possibly using tools like pexpect or something. The python script should run on Windows, and it would be nice if it ran on Linux. The basis EC2 instance should be ami-7c1c3408 (located in the EU).

My interface would be to set the following variables in the python code:

secretKey = "adfjndfndf"

theotherkey = "adjdjfnjdfn"

number_of_greenplum_nodes = 40

greenplum_password_to_be_used = "adfadfjndf"

Possibly setting paths to various keys placed on the local machine.

This should be all that is needed from my side. When I click "run" everything should be automated right up to where I can connect to the coordinator node at port 6453 (or whatever it was). You may use WinSCP and other external tools, but please include them in your final deliverable so I can just click "run" to test your solution.

Your setting up of the EC2 and greenplum should be secure so hackers could not easily get in.

It is OK that a [url removed, login to view] needs to be placed on a server where it is accessible without login and password (unlike <[url removed, login to view]>).

It should be easy to modify your python code, so I can ask other coders to modify it to get greenplum to run in a more optimized way.

You do not need to set up Greenplum with agents (I believe this is their name) on each node.

## Deliverables

The following is my current semi-automatic way of doing what is needed, I cannot guarantee that it works, but it gives you an overview of the task. In this description, the EC2 instances are launched manually in elasticfox:

AMI: ami-7c1c3408 (EU)

useradd gpadmin

echo mypassword | passwd --stdin gpadmin

echo mypassword | passwd --stdin root

wget [url removed, login to view]

mv -f sshd_config /etc/ssh/

/etc/init.d/sshd restart

su -

source /usr/local/greenplum-db/[url removed, login to view]

cd /usr/local

gtar -cvf /home/gpadmin/[url removed, login to view] greenplum-db

source /usr/local/greenplum-db/[url removed, login to view]

gpscp -f /home/gpadmin/single_seg_hosts_file /home/gpadmin/[url removed, login to view] =:/usr/local

gpssh -f /home/gpadmin/single_seg_hosts_file

gtar --directory /usr/local -xvf /usr/local/[url removed, login to view]

chown -R gpadmin /usr/local/greenplum-db

chgrp -R gpadmin /usr/local/greenplum-db

mkdir /mnt/data

chown gpadmin /mnt/data

chgrp gpadmin /mnt/data

exit

Or:

You also need to edit /etc/ssh/sshd_config to allow root logins with passwords.

The line:

PermitRootLogin without-password

should be changed to:

PermitRootLogin yes

S?k etter tekst (flere steder): PermitRootLogin og skriv

PermitRootLogin yes

Search for text (exists several places): PasswordAuthentication and replace with:

PasswordAuthentication yes

run:

/etc/init.d/sshd restart

chown -R gpadmin /usr/local/greenplum-db

chgrp -R gpadmin /usr/local/greenplum-db

source /usr/local/greenplum-db/[url removed, login to view]

source ~/.bashrc

mkdir /mnt/data

chown gpadmin /mnt/data

chgrp gpadmin /mnt/data

create /home/gpadmin/all_hosts_file which contains a list of ipaddresses with master on the first line. See page 46 in admin guide.

Create /home/gpadmin/single_seg_hosts_file which contains a list of ipaddresses withOUT master on the first line. See page 46 in admin guide.

Create /home/gpadmin/multi_seg_hosts_file which is identical with single_seg_hosts_file.

gpssh-exkeys -f /home/gpadmin/all_hosts_file

gpssh -f /home/gpadmin/single_seg_hosts_file '/usr/sbin/useradd gpadmin -d /home/gpadmin -s /bin/bash'

chown gpadmin /home/gpadmin/multi_seg_hosts_file

chgrp gpadmin /home/gpadmin/multi_seg_hosts_file

su - gpadmin

source /usr/local/greenplum-db/[url removed, login to view]

gpssh-exkeys -f /home/gpadmin/all_hosts_file

gpssh -f /home/gpadmin/single_seg_hosts_file -v date

cd /home/gpadmin/

wget [url removed, login to view]

Modify gp_init_config with inserting the correct ipaddress for the master node.

cd /home

mkdir s3sync

cd s3sync

wget [url removed, login to view]

gunzip [url removed, login to view]

tar xvf [url removed, login to view]

cd s3sync

Add following text to /etc/s3conf/[url removed, login to view]:

export AWS_CALLING_FORMAT=SUBDOMAIN

/home/s3sync/s3sync/[url removed, login to view] get abucket:[url removed, login to view] /home/[url removed, login to view]

cd /home/

unzip [url removed, login to view]

chown gpadmin /home/[url removed, login to view]

gpscp -f /home/gpadmin/single_seg_hosts_file /home/[url removed, login to view] =:/usr/local

(to initialize greenplum):

su gpadmin

gpinitsystem -c /home/gpadmin/gp_init_config

edit [url removed, login to view] add i.e. following line:

host all gpadmin [url removed, login to view] md5

export MASTER_DATA_DIRECTORY=/mnt/data/gp-1

source /usr/local/greenplum-db/[url removed, login to view]

gpssh -f /home/gpadmin/single_seg_hosts_file "echo 'work_mem = 32MB' | cat - >> /mnt/data/gp*/[url removed, login to view]"

gpssh -f /home/gpadmin/single_seg_hosts_file "echo 'maintenance_work_mem = 64MB' | cat - >> /mnt/data/gp*/[url removed, login to view]"

gpssh -f /home/gpadmin/single_seg_hosts_file "echo 'max_connections = 15' | cat - >> /mnt/data/gp*/[url removed, login to view]"

export MASTER_DATA_DIRECTORY=/mnt/data/gp-1

gpstop -r (confirmation needed)

psql template1 (press enter)

1) All deliverables will be considered "work made for hire" under U.S. Copyright law. Employer will receive exclusive and complete copyrights to all work purchased. (No 3rd party components unless all copyright ramifications are explained AND AGREED TO by the employer on the site per the worker's Worker Legal Agreement).

## Platform

Windows/Linux/EC2

Habilidades: Programação C, Programação C#, Java, Perl, PHP, Python, Ruby on Rails, Arquitetura de software, Teste de Software

Ver mais: gpssh exkeys ec2, gpscp greenplum, greenplum ec2, work from home guide, work from home agents, work at home database, work at home agents, which places hire at 15, where to hire coders, where to get python code, where do you get a python, where can i hire coders, where can i get python, what was the, what places hire at 15, what places hire at 13, what is server side scripting, what are some places that hire at 15, v worker home, v coders, the right hire, the hire source, the hire solution, the coders, ssh copy directory

Acerca do Empregador:
( 165 comentários ) Copenhagen, Denmark

ID do Projeto: #2967097