CMPS 350 Lab 08 - "Parsing XML with PHP"

resources:
php regex match
PHP sample code
firebug extension
PHP manual
XML parsing
printf man page

BACKGROUND.
PHP is a server-side scripting language for web programming (server-side meaning that the php script is executed by the web server and not the client's browser.) The PHP language targets programming tasks one is likely to do in a web environment. The basic syntax of PHP is a blend of C and Perl with some object-oriented features thrown in. One difference between Perl and PHP (both use dynamic typing) is that in PHP all variables (scalars and non-scalars) are prefaced with '$'. PHP is fully object-oriented after PHP 5.

This lab assumes you understand the basic PHP-HTML form interface introduced in Lab 07. If not, copy and study these sample files into your public_html/lab08 folder so you can study the behavior and syntax.

      /home/fac/melissa/public_html/cs350-f15/Code/PHP/form.php 
      /home/fac/melissa/public_html/cs350-f15/Code/PHP/form.html 

You cannot view the source from your brower since the web server always executes php. You should also read this before starting. Test form.html by clicking on form.html from a browser. The form.php server-side script provides an interface between data the user enters into form.html on the client and a process running on the server that grabs the form data (the PHP engine) and passes that data to a script. The big advantages to server-side scripts is that 1) everything on the server can be hidden/secured from the client's web browser and 2) you have full access to files on the web server within the bounds of your file permissions on the server.

When you hit the Submit Form button, whatever action you specify in the form tag is called. If the action begins with javascript; e.g., javascript:test(), the browser looks for the function test() within the scope of the client document. If, on the other hand,, the action does not begin with javascript, the browser assumes the action is for a script on the server. Example:

    <form method="post" action="form.php">

In the example above, the PHP script 'form.php' is executed on the server. How does the transfer of data occur? Notice the method="post" form attribute. The client browser dumps all the form data in a $_POST associative array and passes it back to the server for access by the PHP script. The $_POST form elements are accessed in the array by a key, which is the element's name in the HTML DOM; e.g.:

    HTML tag:    <input type="text" name="name" id="name">
   PHP variable: $name = $_POST['name'];

Communication back to the client browser occurs in a new browser page. Make sure you understand everything in the sample script. How do you test/debug PHP? You can quickly test your script for syntax from the command line: $ php form.php . PHP will give you several warnings of this type that you can ignore:

PHP Notice: Undefined index: entree in /export/home5/users/melissa/public_html/cs350-f15/Code/PHP/form.php on line 14

This is normal if you run the script without entering data into the form first. Any other errors need to be fixed. (ASIDE) If you want to combine your HTML form and your php script into one file, look at the example of doing so here:

 /home/fac/melissa/public_html/cs350-f15/Code/PHP/combined.php 

This lab assumes you will use two files lab08.html and lab08.php. While debugging you can spit everything but arrays within arrays to the screen with this code:

  foreach($_POST as $key => $value) {
     printf("%s = %s<br>", $key, $value);
  } 

This lab covers one more of the web specific features of PHP -- parsing an XML file. XML is a standard for representing text on the Internet. XML looks like HTML, with some important exceptions. The tag names are specific to whatever data is in the file -- you make them up to match the file. XML is hierarchial and requires a single upper-level tag in the file. Every opening tag must have a closing tag. In XML there is a distinction between attribute data and element data. In the XML element below, the attribute "name" has value "santa claus". The attribute "id" has value "567". The element data (aka cdata or character data) has value "this is just random text". Cdata has no key associated with it.

    <stuff name="santa claus" id="567">
          this is just random text 
    </stuff>

Once you have a file that matches XML quidelines, PHP has a built-in parser that will grab the data in the file. You just need to manipulate the data based on the structure of the XML file. You will start with some working PHP code for a parser and an XML file. The current parser does this. The parser functions in the sample code make use of global variables. If you are familiar with PHP's object oriented capabilities and wish to do so, you can make a parser class and encapsulate the global variables inside a class. Understanding the parser is easier without the OO encapsulation. Copy the code from here into your public_html directory on sleipnir:

  $cp /home/fac/melissa/public_html/cs350-f15/examples/week08/parser.php .
  $cp /home/fac/melissa/public_html/cs350-f15/examples/week08/data.xml .

LAB 08 REQUIREMENTS

  1. ITEM 1. Copy parser.php to lab08.php. Your first task is to add an event handler to lab08.php that grabs the character data within XML tags. In data.xml, the character data holds some free-form text. The startElement and endElement event handlers are already coded for you. Follow the example in these handlers to code the character data handler. Add the cdata text to a field NOTES in the associative array for each item. Display the NOTES text on the screen in your report. The syntax to create a handler is:
     
     xml_set_character_data_handler($xml_parser, "XML_characterData");
    
     The function prototype for the handler is:
     function XML_characterData($parser, $data)
     {
      // $data holds the character data 
      // add $data as another field in the associative array for each record.
    
     }
    
  2. ITEM 2. Write a function report() that traverses the $software array that you parsed in Item 1. Function report() should generate nicely formatted output. Refer to the printf man page for help in formatting the output. Click here to see the output of my solution. Your output should be similar to mine.

  3. ITEM 3. Add a page lab08.html as a front-end to lab08.php. Your lab08.html file will contain a form that upon submission calls lab08.php. The purpose of this form is to simply allow the user to type in a regular expression. You can use my lab08.html file. Modify it if you wish. Copy it from here:
     $ cp /home/fac/melissa/public_html/cs350-f15/solutions/lab08/lab08.html . 
    
    In lab08.php add a function findbyregex() that traverses the $software array and for each entry in the array, applys a pattern match against the regex from the user and the category and description fields. The php function you need is:
         preg_match($pattern,$string); 

Note: If you get the white screen of death on the webpage review the php_errors.log in the directory where your script resides.

HOW YOUR LAB WILL BE GRADED

Your files must reside here:

  /home/stu/{username}/public_html/cs350/lab08/lab08.html
  /home/stu/{username}/public_html/cs350/lab08/lab08.php

Your lab08.php script should have 600 permissions.

I will test your code by running your code from the URL on this list. Your code should work like mine.