Thursday, August 26, 2010

Screen Scraping to fill in database fields.

So, I've been a bit annoyed because the publicly available database that I can get my hands on isn't full of all of the information that I want. Luckily, that information is available online, so I can use a technique called screen scraping to push the information into the relevant parcel database by knowing the location of the information I want on the web page that talks about it. Thus, I will have a database will all of the new information included.

The code is below and pushes the name of a property owner into the parcels database that I have setup



ini_set('display_errors', 1);
error_reporting(E_ALL);

// Opens a connection to a MySQL server.
$username="user";
$password="pass";
$database="db";
$connection = mysql_connect ('localhost', $username, $password);
if (!$connection) {
die('Not connected : ' . mysql_error());
}
// Sets the active MySQL database. You can't select a database unless the connection is open, can you?
$db_selected = mysql_select_db($database, $connection);
if (!$db_selected) {
die('Can\'t use db : ' . mysql_error());
}
$numPts = 100;
$startpt = 300;
$totalpts = $startpt + $numPts;
for ($i = $startpt; $i <= $totalpts- 1; $i++) {

// Pull out RP_ACCT_ID for a given ID key value: INDEX RP_ACCT_ID
$count = $i;
$query1 = "SELECT RP_ACCT_ID from parcels where parcels.ID='$count'";
$result = mysql_query($query1);
//echo $query1;
//echo $result;
while ($row = mysql_fetch_array($result)){
$accounta= $row['RP_ACCT_ID'];

}
// processes the valid RP_ACCT_ID to get the owner from the website:
$entry=$accounta;
$address="http://146.218.200.12/pls/ilisw/lis.online_lis_info_pkg.acct_info?p_in_rp_acct_id=";
$address=$address.$entry;
$file=file_get_contents($address);
$webpage=addslashes($file);
$start = strstr($file, "Name:");
$startlen = strlen($start);
$end = strpos($start, "Mailing");
$end = $end - 154;
$name = substr($start, 92, $end);
$start=$start+74;
$end = strpos($file, "Mailing Address:");

// insert owner back into database (combine these updates)
$query2 = "UPDATE `db_kitsap`.`parcels` SET `OWNER` = '$name' where parcels.RP_ACCT_ID='$accounta'";
$result = mysql_query($query2);


//insert web page back into database
$query3 = "UPDATE `db_kitsap`.`parcels` SET `WebPage` = '$webpage' where parcels.RP_ACCT_ID='$accounta'";
$result = mysql_query($query3);
}


No comments:

Post a Comment