Please have a look at my profile. I have 12 years of experience developing large scale enterprise applications using java. If you are interested kindly contact me.
Technical Expertise:
Spring Boot, Hibernate, Spring Data JPA, JMS, Elastic search, Web Service(SOAP/Restful),
AWS cloud computing (S3, VPC, EC2, ELB, SQS, RDS, Cloud Watch, Cloud formation script), MQ, ESB, Message Broker, Mongo DB, Multithreading, Micro services.
Since last two year I have been working in elastic search and recently I have upgraded to 5.6.4.
Regarding to the problem statements
1. If the node is taking 30 minutes to recover, there could be multiple reasons. One reason could be infrastructure issue. How many masters/data nodes are there also does matter. However if one node goes down it should not impact the search because data is replicated across different data nodes.
2. Aggregation queries are always heavy. It is also important on the field type(analyzed/not analyzed) on which aggregation queries are fired.
What is the Elastic version currently being used?