Create a free blog, web site, photo album, guestbook, earn money, share things with your friends!
Login | Sign Up 
Welcome to webmining's website!

Knowlesys Products

knowlesys, web data extraction









Overview




 










The web is an ocean of information containing more than
10 billion web pages, wherein 90% of them are in non-structured or
semi-structured formats. At present, it is expanding with an increasing rate
of 1 million pages per day. The information is increasing at an explosive
speed while people’s time and energy are limited. The information absolutely
valuable for enterprises or individuals is just lying in this worldwide ocean
of the Internet, and how to extract them has become one of the most
imperative tasks confronting the research institutions that are engaging the
important topics of
Information Retrieval, Data Mining, Knowledge
Management
and Competitive
Intelligence
etc.


The Blue Whale Web Data Extraction
System(BWDES)
is like a huge blue
whale who cruises in this information ocean everyday and is capable of
automatically and accurately extracting valuable information for you from the
webpage ocean wherein a multitudes of useless messages (such as page headers
and footers, column listings and advertisement messages) shall be excluded.


In more than three year’s time, the Knowlesys Software, Inc.
had developed the
BWDES – a powerful web information extraction system. It has
a stratified structure and a loosely coupled module design comprising many
sub-systems. The
BWDES can extract designated information in big volume from
the web, and integrate them into specified relational databases, thus to help
customers to excavate precious stones from the Internet minefield. Since the
process converses the information from the semi-structural form into the
structural form, from their dispersed state to the concentrated state, and
changes them from the remotely existed information to your locally hoarded
treasure, as well as from the visual file into the digital record, you can
surely extensively use them in the future.


The BWDES is capable of doing data extraction from various types
of websites. In addition to extracting field data of semi-structured
construction, it can also extract some free text information like e-mail
addresses and many types of multimedia files.



The
BWDES is characterized as a stable
running, intelligent crawling and accurate extracting software. The
BWDES is an information extraction platform. When new
extraction task is required, it is necessary to use this platform to
configure the new web crawling and extraction script and parameters.


A general database access layer is developed in the BWDES that enables its back end connect to any relational
database, such as MS SQL Server, Oracle, DB2, Sybase, MySQL and InterBase
etc, even those file database like the Access database. Regardless which type
the database is, the extracted data can be checked with a general database
browser, as well as export them into various formats such as XML, CVS, HTML,
Excel and so on.

For more information, please visit our website: http://www.knowlesys.com











Date: 16 June 2008, Monday
Comments (0) | Add Comment



Comments (0)

Add a new comment:
Name:
E-Mail:
Your website (if you have):
Your Message:
Security Code: