Sunday, October 27, 2013

Simple Chinese Word Segmentation Lib for Flash AS3 - SCWS Ported to Flash Using CrossBridge

Unlike English sentences, in a Chinese sentence, there is no space between two words (http://en.wikipedia.org/wiki/Text_segmentation). This can cause lots of trouble for processing the language on computer.

SCWS is a simple Chinese word segmentation C lib. I just ported it to Flash using CrossBridge - the latest open source version of FlasCC. You can use the pre-build swc library "libscws.swc" in your Flash/AS3 projects.

The SCWS lib depends on an extra ".xdb" dictionary file and a ".ini" rule file, which can be downloaded at http://www.xunsearch.com/scws/download.php. However, the CrossBridge's file system is not as simple as the old Alchemy(See this post, and simplified code), so I use the class by twistedjoe from http://forums.adobe.com/thread/1147910, which doesn't require any genfs processing on the files.

There is almost no modification of the original C source files, except for the file "lock.c", I commented the line to pass the gcc complains:

//#warning no proper flock supported

To use the swc library, you must set compiler options "enable strict mode" to false! Otherwise, the AS3 compiler will throw error "Error: Call to a possibly undefined method addEventListener through a reference with static type CrossBridge.libscws.vfs:URLLoaderVFS".

There are two main functions in the AS3 library: "initialize_SCWS_AS3()" and "scws_send_text_AS3()".
For using the "libscws.swc", firstly, load the dictionary file and the rule file and supply them to the C module. This can be done in common CorssBridge/FlasCC routine: use a URLLoaderVFS's "loadManifest" function to load the manifest file, which contains the files' names and paths.(See the demo's source code for more details, for the manifest file, https://github.com/twistedjoe/flascc-URLLoaderVFS gives more information.) After the dictionary file and the rule file were loaded, call "initialize_SCWS_AS3()", which will initialize the library for use. Then you can call the function "scws_send_text_AS3(input:String):String", with the text to be processed as the parameter, and it will return the processed text, with space as delimiter.

Here is the demo(Input the texts at the bottom, Return Key for sending to the console.):



Full source code of the demo and the lib:
https://flaswf.googlecode.com/svn/trunk/LibSCWS

Links:
http://www.xunsearch.com/scws/
http://nlp.stanford.edu/software/segmenter.shtml
http://ictclas.org/index.html
http://technology.chtsai.org/mmseg/
http://www.coreseek.cn/opensource/
https://github.com/fxsjy/jieba

Saturday, October 19, 2013

Simple HTML Page Creator in AS3

I wrote this simple HTML page generation tool for the small flash game portal site: http://play.flaswf.tk. Just fill in some embedding parameters for the swf, such as the url, width and height, then it can produce the simple page for the game.

Nothing complicated here. UI is using MinimalComps and the AS3 code only does some string replacing for an HTML template.
Two things to note:
First, for multi line string in AS3 (e.g., the string variable in my code for holding the HTML template), we can use the "CDATA" tag:

//http://dougmccune.com/blog/2007/05/15/multi-line-strings-in-actionscript-3/
private var myString:String = ( <![CDATA[
    Here is my string 
    that spans multiple 
    lines.
    ]]> ).toString();

Second, the build-in AS3 string replace function can only replace the first match in the string. A custom replace function is needed for replacing all the occurrences:
//http://actionscriptsnips.blogspot.com/2009/07/search-and-replace.html
  private function strReplace(str:String, search:String, replace:String):String
  {
   return str.split(search).join(replace);
  }
 
Finally, you can try the tool here:
http://play.flaswf.tk/SubmitYourGame.html

And the source code:
http://flaswf.googlecode.com/svn/trunk/HTMLWebCreator/

Sponsors