Walkthrough
( Phase 1 | Phase 2 | Phase 3 | Using repository )


 

-------


Phase 1 - Object and Element Extraction

  1. Visiting the main page of XWRAP Elite service and entering a favorite website of yours that you are interested to wrap, such as http://www.netlibrary.com/;

  2. The XWRAP Elite proxy service will bring your favorite website to you, and you can now surf the website (left screenshot) and locate the kind of web pages that you want to wrap by clicking on a link or entering a search (right screenshot). When you are at the exact webpage you want to wrap, click on on the data extraction button in the XWRAP Elite service panel located on the top-left corner of the page;

  3. Now you are entering the object extraction step, which locates a subtree of objects in a HTML document and extract these data objects.

    • There are six heuristics to locate a subtree that contains data objects of interest. The first one is the best according to our primitive study. However, it is possible that other heuristics can work better than the default heuristic in certain cases. You can manually change the subtree heuristic by clicking the number or NEXT.
    • if most objects are correct, give a name for the objects. For example, you can input "book" for amazon's book search results, and "auction item" for ebay's auction search results.
    • Click Element Extraction Button

  4. Entering the element extraction step where objects are separated into elements.

  5. Click on the Refine button. Entering the extraction refinement step.

    • You can refine the extraction process by modifying the object size. All the garbage objects that contain too many or too little elements can be pruned out if you specify a correct range of object size. The default values of object size are computed by our heuristics. You can change them with your own judgement.
    • The advanced tuning is for expert XWRAP Elite users. You don't need to change it in most situations. The recommended tag separators are HTML tags that always separate elements in objects. XWRAP Elite discovers them automatically. If you know that a tag always separates elements but is not included in the list, you can add it manually. The delimiter for tag separators is comma. The recommended text separators are strings that always separate element in objects. All the text separators are quoted with the quotation marks on both sides and then delimited by comma.

  6. Click on the submit button, it shows the results after pruning.

    • If you are not satisfied with the results, go back to the refinement panel to input a new range of object size.
    • If the results turn out to be possitive, click on Element Tagging.
    • If the results are still not good enough after you try various ranges of object size and recommended separators, XWRAP Elite cannot help you on Element Extraction any better. You can get a wrapper that only extract objects by clicking on the Code Generation Only For Object Extraction button.

  7. You can enter your favorite tag name for each element extracted in this step.

    • Find an object with the maximum number of elements in the extraction results.
    • Enter tag names for each element in the object. note:If an element is useless and you don't want it in the final wrapping result, leave the element name starting with "AutoGeneratedItem".
    • Make sure the element types are correct. XWRAP Elite automatically detect the type of elements. You need to change the type if we discover things wrong.
    • For the string type of elements, you may want to use XWRAP Elite alignment rules, then any element that is similar with the alignment hints will be treated as this particular element. The alignment rules are very effective for elements as identifiers, such as "read more", "Ships within 24 hours", etc.

  8. After you have tagged the elements of an object, you can run the service to generate the extraction module as a Java class.

  9. You are now at the final step of the object and element extraction phase. If you only want to have an extraction component, then you can already download your Java source code, which, as you will see from the source code, calls the XWRAP Elite library functions. If you want to run the extraction module at another computer, then you need to download the Java class object code.


Phase 2: Search Interface Extraction

  1. The first step in Phase 2 is to choose the placeholder string that is unique in the given context of the Web pages, and replace the search keyword value in the given URL by the placeholder string. In the example screenshot, we use the default placeholder string $$ to replace the search keyword value "Java" (see the screenshot above)

  2. Now you are at the second step of Search Interface Extraction Phase. You need to choose the name for the keyword1 from the list of element tag names or simply entering a new name, say "Full Text Search" in this case.

  3. Now you are at the third and last step of Search Interface Extraction Phase - the code generation step.

Phase 3: Wrapper Code Packaging

  1. The first task in the code packaging phase is to specify the name of your wrapper.

  2. Entering the code packaging step. The source code and object code are downloadable now. If you want to run a few tests before you download the code, you can do so using the search interface of your wrapper.

  3. Below is the test run result. The search keyword for the test in this case is the word "future". The search context is all text fields.

  4. You may also perform an advanced search by entering a second or third search field. For example, we can enter a word "Day-trader" in the title field (see left screen shot below) and the result is shown in the right screenshot below.

Using Wrapper Repository

  1. You may register your wrapper in the wrapper repository as shown below.

  2. You may also search the wrapper repository by keyword matching on wrapper name, source URL, author name and so on. For example, the left screenshot below shows a keyword search on source URL and the right screenshot shows the result of this search

  3. When you click on the detail, the following screen will be bring up to you. (need long version)


-------


For problems or questions regarding this web contact [XWRAP Elite].
Last updated: April 06, 2000.