The data consist of ~30k points, each with X and Y coordinates so they can be placed on a graph. In addition, each point belongs to between 0-5 categories. The task is to write an R script that accomplishes the following:
Step 1: The points should be clustered based on their proximity to one another and based on the categories they belong to. This should be written in such a way that the user can select the number of clusters in the solution (e.g., 25).
Step 2: Contiguous polygons should be drawn around each cluster such that all points assigned to the cluster are inside the borders of the polygon, and the polygon is contiguous with neighboring polygons. For instance, "[login to view URL]" (attached) illustrates what the borders around clusters might look like. Notice that the clusters share borders with their neighbors. Imagine that all the points are cities within Europe and the clusters are countries. All countries must share borders, there can't be any "unclaimed" space on the map.
Step 3. The points within each cluster should be clustered again based on their proximity to one another and based on the categories they belong to. Ideally the optimum number of clusters would be automatically detected by the clustering algorithm (for instance, with DBSCAN or OPTICS) but it is acceptable for the user to manually define the number of clusters if no automatic solution is suitable (e.g., Kmeans).
Step 4. A polygon should be drawn around each sub-cluster, just as the original cluster boundaries were drawn. However, the original cluster polygons should act as the bounding boxes for these smaller sub-clusters. If the original clusters are countries within Europe, then these sub-clusters could be viewed as regions/districts within a country. An example of a "country" polygon can be seen in the attached "[login to view URL]". The state is then subdivided further into regions/districts, an example of which can be seen in "state_sub-cluster [login to view URL]"
Step 5. Finally, the polygons should be saved as shape files (.shp) and the cluster and sub-cluster assignments for each point should be saved as a .csv file.
I've attached an example data file and an example R script to demonstrate at least some of what I'm hoping to accomplish. The R Script only uses X and Y coordinates for clustering and it doesn't create contiguous state borders, but I hope it at least gives you a better understanding of what I'm trying to do.