HowToGIS: April 2009

The original solution I tried was simply displaying a KML overlay of the impervious surfaces on top of the google map aerial imagery. This doesn't work well for any number of reasons. First, the KMLs get too large for google to display with 1000's of entries. Google resents having such large datasets and doesn't allow you to load the pop-up windows that I programmed into the KML (within ARCview) consistently for these large datasets. In addition, having such a large dataset is an overload of information: the customers using this system don't want to know about EVERY impervious surface in the city, so why am I bothering to show them all? The answer is that the KML-based solution is limiting. Its a great first step for displaying datasets, especially points and regions; but when you get down to the nitty gritty of thousands of polygons it ceases being easy/useful and starts becoming a liability.

So, what to do?

Well, the first thing I did was ask the great people out there on the web what they thought I should do. Their answer was to get just the information that people want to them, without displaying everything else. That way, the loading will go quick: be much less intensive, and demanding as far as network resources go as well as help the customer arrive at the specific information they require MUCH quicker than if I made them search around the map themselves looking for their property (and likely learning a new interface at the same time).

Okay, so we're going to have a map that displays imagery (perhaps the custom tiled imagery that I put there and discussed using earlier) and we're going to display just the polygons that the customer wants to see. How do we know what the customer wants? We're going to ask them of course. So our page is going to have another visual component - and that is a couple of search boxes; the first box will be for entering address buttons and the second will be for entering billing account information. NExt to these text input boxes, we're going to have a couple of submit buttons so they can input their query into the system.

The behind the scenes funtion of the website in this case is much more sophisticated than the earlier maps that I created. Those were just javascript, html, and the KML data files. The javascript was in two forms - the javascript which loads and runs the google application itself and the javascript that I composed to toggle the image layers. The KML data files are just a geographic flavor of XML with a bunch of markup to help google display the polyons and telling it how to label them effectively.

The new page is going to have two additional elements - php scripts and a MYSQL database. Php is a scripting/programming language that will allow us to run a query and generate a subsequent XML document that is composed of the information we need to display the correct polygons that our user wanted to look at (based on the query they entered). So, rather than being a static file containing all of the information from ALL of the impervious surfaces - the new website will be displaying temporary, dynamically generated XML document that will be deleted as soon as the page is closed, and contained within that XML will be the polygon information that match the query the user submitted. Great right?

Well, not so fast.

It is that simple as far as function goes, but problems arise. First and foremost of these is the fact that in order to query the database for the polygons to display - the polygons have to be in the database to begin with. How can a shape be in a databse, you ask? Well, because of geometry field types within the MySQL database, of course.

when I am displaying the polygons from the KML files - the geometry of each polygon was entire unknown to me: which is to say I didn't have any idea about how it got there. I'd just export the map from ArcVIEW using their built in shp2kml utility and the kml would appear, complete and when I loaded it all of the polygons would be there. An examination of the components of the KML file show how the polygon information is stored; which is a series of points that begin and end on the same line. There are lots of other attributes for the polygons in there too - opacity, colors, border width, etc.

So, we'll build a database, and in the database will be a cell and in that cell will be the same text that, when placed into the XML document will be the same as that produced by the KML-generating utility. And basically, that's what happens. We have geometry fields within the database program that I use (MySQL), and these field types know how to store the text that I enter into them - furthermore, they know so much about those data types they can convert them effectively and store them appropriately so that when recalled the data comes out as a polygon, which is great. There are two main ways that polygon information is stored, the first is so-called Well-Known-Text formation (WKT), which is a text desription of the polygon something like polygon((1,1 2,2 1,1) polygon X). That's great, easy to understand and 1 and 2 are the apprpriate coordinate pairs as you travel along the outside of the polygon. The other way the data is stored is as Well-Known-Binary (WKB) which is a binary translation of the Well-known-text and gets the information one step closer to machine code. As an aside, MySQL stores the data internally within the database in a format that does not conform to WKT or WKB, but in a third manner entirely, but that is not germane to our processes, because when we call data out of the database, we call using an AsText(field) function that converts the internal information back into WKT, which we can then use for our purposes.

Okay; so spatial data (points, lines, polygons, multipolygons) are capable of being stored in MySQL, but how does it get in there. Well, we could go in and populate the fields manually; its possible. We could generate the polygons ourselves with a GUI (Google has examples of this) and then store them in a geometry field in the MySQL database. But, that doesn't work for us - we're not interested in generating NEW data, we already have all the polygons we need. How do we take the polygons we have, in this case 1000's of them inside a SHP file editable by arcview and get them stuck into the sql database? The shp file doesn't store information in WKT, its not like we can copy them out and ESRI doesn't provide a utility to export SHP file information into a form recognizable for input by MySQL. What to do?

The solution to that problem is to use the shp2mysql utility (available here) that was developed for just this purpose. This utility is itself a slight modification of a similar script shp2PostGreS which imports into another similar database system, Post-GReSSQL. That database system is actually better for storing geographic information, but it isn't offered by a lot of hosts so it isn't as popular, even though it is better and questionably easier to implement. The shp2mysql utility is a command line program that takes in paraments (shape file name, database name, table name) and outputs a SQL script. That SQL script, when loaded in a MySQL database (by copying it into the SQL window, or importing in the PhPMyAdmin GUI, produces a table that contains all of the polygon data from the shp file - including ALL of the feature attributes stored in the .dbf file that is part of the SHP package you were working with.

There you go - a MySQL database filled with all of the feature attributes you were working with in ArcView and their associated attributes from the attribute table you were working on.

YEAH right? Well, that's actually just the easy part.

----------------------------------------------------------------------

Now, we've installed the capability to utilize the polygon information from the shp files we were working with - storing that information instead in a queryable MySQL database that sits on our web server and is happy to spit out just the specific information our customers are looking for.

But we have to program a web page that will do that.

This web page will have all of the same elements from the map page I created earlier - we'll use the google maps API; maybe we'll modify the map to utilize the imagery we have here to display at the higher zoom levels. How do we actually perform the query?

Well, we'll take the text that the user entered in the search boxes and we'll use PHP to submit to the database, querying the appropriate query based on which search box the data was entered into. The data that was entered by the user will become the paramater upon which the query is performed. For accounts the query will look something like this.

qrysql = "Select *, astext(OGC_GEOm) from tblISUs where ISU.Accts = '$querytext'"

The OGC_Geom field is the geometry field where our polygon information resides with the MySQL database, and the Astext() function makes the database spit out the polygon information as WKT, which we can use rather than the MYSQL geometry encoding which we can't use, and which also doesn't look nearly as pretty when seen on the screen.

The database will spit out some WKT, along with a lot of other nonsense (the other attributes for the data). We'll have to process, sort, and display that information with PhP so that what we have at the end is a pretty little XML document which has pop-ups that look just like we want them. I'd speak more on this part, but I haven't done it yet - so my understanding of the process is actually a bit rudimentary at best; but that's bound to change soon enough.

-------------------------------------------------------------------------------

So you have a database, and you want to start pulling individual polygons out of it. I decided to use PhP for that step of this process. Others have used varying solutions. For my project, I wanted to be able to generate documents on-the-fly, rather than just run the script once and be done with it - and PhP is a great workhorse for processing on demand.

The first thing I did was find someone who was doing anything even remotely similar. AT the time, I didn't know about writing PhP and I just wanted an out of the box solution I could copy and be done with. In the end, I had to learn at least a rudimentary amount of PhP to fix things, but it worked out. I worked off of Pamela Fox's tutorial about encoding KML points and lines using PhP. That was all well and good, but I needed to modify it to encode Polygons. I asked around, and a nice fellow named Ralf in Germany gave me some modified code (modified from Pamela's Tutorial) that I started to work with. One problem was that Pamela's code was in PHP 5, and Ralf's was in PhP 4, so I had trouble deciphering that for a while. Eventually I started to realize what was happening in the code - something called the Document Object Model (DOM) was getting referenced a lot, and I started to get the jist of the object oriented programming of the PhP script. I also found a tutorial that did exactly what I wanted to do - namely the author took multipolygons and spit them out as a javascript array; BUT, the program was a one-time program (which he wrote using Ruby) rather than something that my server would allow me to run each time a new account was requested. So, I had to write the PHP by in large mostly myself, but it wasn't as hard as I first imagined.

I have had people question my approach. Why create KML's, they wonder - it seems like a wasted step. Well, for a couple of reasons. The first is that when I started this, I didn't know any other way of getting data displayed in google maps. I know other people build javascript overlays, but I don't know how to do that, and I don't know how they would be formatted. KMLs are a geographically flavored XML document, which means they are very orderly and structured perfectly and consistently - that means that creating them using DOM is a cinch - whereas creating the javascript overlay is not as easy (or at least it wasn't for me). Secondly, I should have an idea of how much the program is being used by how many KML's are getting generated, which is fun - and although there are other ways to record this information, its just one more confirmation that people are using the program like they are supposed to. The KML files which are created are not large - 4-20kb each and even if every polygon is made into a KML, the size will amount to only a trivial amount of storage space.

So, basically our PhP program (LINK) does several things:

1. Request information in the database meeting our search criteria (a well-formed SQL statement)
2. It takes one row of that data at a time and processes it.
2a. It takes the multipolygon object in the OGC_GEOM field, and chops out the beginning part and the end parenthesis
2b. It takes the remaining part (mostly coordinates) and explodes them into an array, separating at commas.
2c. It takes those exploded values, and further separates them into individual coordinates (lats and longs)
2d. It puts the exploded values back together again in the same format they will be displayed in the KML.
3. It formats a complete and well-formed KML document using information within the table
3a. Integrates the coordinates from step (2) into the KML document, along with other attributes from the table.
3b. Formats the text boxes that will pop up when the polygons are clicked in google maps or google earth.
3c. Formats the appearance of the polygons displayed, coloring the polygon and its lines and setting the line width.
4. Saves the formatted KML text as a KML file in its own right in the server.
5. Sends a variable to a google maps Page and integrates the new KML as a geoXML overlay

Here is an example of one of the KML files that is produced from the program:

for loading in google earth
for just the txt of the KML file

AS you can see, apart from being programmed correctly, the KML contains all of the information we had hoped to include about the impervious surface polygon in question, including its area, how many ISU's that area represented, which account it was assigned to and what the address of that account is as well as the TYPE of surface - be it a parking lot or a rooftop.

Getting the formatting to work perfectly was a challenge, but in the end it works perfectly. I will say so far that things look great: the only exception I make to that is that some of the polygons look rather simplified (fewer points than they should have) and I'm looking into why that would be the case.

Okay, so in gettting the information back OUT of mySQL you're going to find you have a bit of a problem. The problem is that shp2mysql encodes the data within the shp files as multipolygons. So, even after you get the PhP to successfully parse everything, there will still be some errors due to that. I solved this problem by simply deleting the holes: that is - when I had a polygon with holes in it, I just let the hole get filled in by deleting the secondary coordinates that dealt with the shape of the hole. My dataset didn't have many holes, so it didn't make much of a problem for me, but I am confident now that one could compose a solution to deal with that problem, but it would take a good deal of work more than I was prepared to devote to it.

The map portion could be just a simple map: I used the google maps API example code and added a toggle button to turn the polygon overlay On and OFF. I also tweaked the controls to look like the google makes default UI.

As soon as I loaded the polygons, I noticed a problem - namely that the polygons that I had so carefully drawn around every impervious surface in the city were in fact not lining up the way they should be. That is, I'd have a building outline drawn, but the building I had drawn it for would be somewhere nearby but not underneath my polygon: or it would be partly underneath, but the borders wouldn't line up. It was very frustrating, to tell you the truth. That was when I knew that I was going to have to have the imagery viewed in the browser window be the same as the imagery that I drew the polygons from. This was a whole separate digression that involved using custom tiles drawn from the georeferenced imagery I was working from using MAPTILER. That program worked like a champ - if you're working form MrSID files like I was, you should give it a try, the program was awesome.

So, in the end I used my custom map tiles as the aerial imagery background that the polygons are displayed over. The images I tiled are not only perfectly matched with te polygons I am working with - but the resolution of the imagery is much higher than what is currently offered by google earth. As a result, the clients using this program are able to see much more of what they want to see, and make the comparisons I wanted them to be able to make.

A working version of what I have is available if you'd like to see it. This page takes account numbers and spits out a google map with the relevant KML loaded in it and colored appropriately. The page also includes a link at the bottom to display another account number's impervious surfaces - that is just for ease of use so that the user doesn't have to leave the page to get to the next surface they want to evaluate.

After this was done, I proceeded to tweak it and add additional functionality. I also wanted to create this tutorial for folks who wanted to do something similar but didn't have the resources to complete the project (as I didn't) without some additional help.

I hope this has been useful. If it has, and you'd like to say "thanks," I'd love to recieve an email.

Would you like to know more?...

Monday, April 27, 2009

Tutorial: How to convert or import your MS access data into MySQL

Friday, April 10, 2009

Creating KML from PHP and displaying in Google Maps API

Blog Archive