Coding

An impor­tant part of doing research in com­puter sci­ence involves cod­ing. Imple­ment­ing ideas and algo­rithms in order to ver­ify an hypoth­e­sis or visu­al­iz­ing cer­tain aspects of data all involve get­ting your hands dirty. Espe­cially in the field of infor­ma­tion retrieval, prac­ti­cal and system-based eval­u­a­tions of novel retrieval mod­els and algo­rithms is an essen­tial aspect. As such, I have imple­mented all of the math­e­mat­i­cal mod­els devel­oped in my research using a vari­ety of lan­guages, frame­works, and tech­nolo­gies, includ­ing (but not lim­ited to) C++, Java, Perl, and Hadoop. All in all, over the years I’ve coded quite a num­ber of things in a vari­ety of pro­gram­ming lan­guages, rang­ing from Web 2.0 inter­faces to C++ libraries. Below you can find a sam­ple of these. In case you’re inter­ested in the imple­men­ta­tion of a par­tic­u­lar model in a paper of mine, don’t hes­i­tate to ask.

AIDA toolkit

I was the main devel­oper of the AIDA toolkit from 2007 through 2009. It is a suite of tools with which to extract, store, and retrieve infor­ma­tion from tex­tual doc­u­ments. In par­tic­u­lar, it uses Text Min­ing tech­niques to pop­u­late an RDF knowl­edge base, which in turn is used to improve infor­ma­tion access to the source doc­u­ments. Here, the main lan­guage is Java and we use a SOA in which each com­po­nent is exposed as both a SOAP and a REST web ser­vice. The web ser­vices were used by four project part­ners (com­pa­nies and/or other uni­ver­si­ties) who would use them to inte­grate our tools into their work­flows. I also devel­oped sev­eral clients using HTML/Javascript/Servlets that inte­grate and aggre­gate web­ser­vices in a com­mon inter­face. For devel­op­ment there were five team mem­bers, four of which actively con­tributed code. It was my respon­si­bil­ity to make sure all com­po­nents were thor­oughly tested and fully functional/interoperable. The main OS I used for this project is Linux (cen­tOS and Red Hat) and Mac OS X, although I also com­monly use Win­dows for cross-platform test­ing pur­poses. All in all I have pro­duced thou­sands of lines of code for this project; see http://www.adaptivedisclosure.org/aida for more infor­ma­tion. A demo can be found here: http://ws.adaptivedisclosure.org/search/.

ILPS Lucene

ILPS Lucene is a heav­ily mod­i­fied ver­sion of Apache Lucene, that replaces Apache Lucene’s heuris­tic retrieval model with an imple­men­ta­tion of the multi­n­o­mial lan­guage mod­el­ing frame­work for infor­ma­tion retrieval. Apache Lucene is highly opti­mized towards their own retrieval model and adding “com­mon” lan­guage mod­el­ing cal­cu­la­tions resulted in big rewrites of the (Java) code. See http://ilps.science.uva.nl/resources/lm-lucene for more information.

Lucene Query Interface

This inter­face is based on SOAP Lucene and can inter­act with a Sesame repos­i­tory (also through SOAP, see AIDA Stor­age).

SOAP Lucene

This wrap­per turns Lucene into a SOAP webservice.

Grid­Lucene

I have imple­mented Grid-specific classes for Lucene, to let Lucene inter­act with files on a Grid (both for index­ing and retrieval), as described in Deploy­ing Lucene on the Grid. They use the Jar­gon API exten­sively. Addi­tion­ally, the use of Jar­gon makes it pos­si­ble to incor­po­rate meta­data about files, direc­to­ries, and/or col­lec­tions transparantly into Lucene. The files can be obtained here, under the same license as the one Lucene is dis­trib­uted with. You will also need the Jar­gon and Lucene jar files. Grid­Lucene has been tested to work with Jar­gon v1.4.20 and Lucene v2.0.0. If you have any ques­tions, sug­ges­tions and/or com­ments regard­ing Grid­Lucene, feel free to send me an e-mail. I’ll be happy to answer any ques­tions you might have.

Lucene/Lemur/Indri Util­i­ties and Classes

I have writ­ten var­i­ous tools for pre­pro­cess­ing, nor­mal­iza­tion, cal­cu­la­tions such as PMI, etc. One day these will all end up here.

Par­si­mo­nious Implementation

Please send me an e-mail if you’re inter­ested in obtain­ing a copy of the code (based on Lemur/Indri) that I used for the par­si­mo­niza­tion experiments.

OTRS Sta­tis­tics

I did my master’s intern­ship project at a web­host­ing com­pany called Host­net, located in Ams­ter­dam. My assign­ment was to per­form an in-depth sta­tis­ti­cal and qual­i­ta­tive analy­sis of the response times of the var­i­ous depart­ments, based on the open source tick­et­ing sys­tem OTRS. To this end I’ve writ­ten a Perl mod­ule that per­forms sev­eral sta­tis­ti­cal func­tions, based on OTRS data.

Caveat

Mind you, it was the first major Perl script I ever wrote, so it’s def­i­nitely not opti­mized. Sec­ondly, the script is biased towards the spe­cific envi­ron­ment of Host­net, mean­ing for exam­ple that it only sup­ports a MySQL DB. How­ever, I do believe some of the ideas may be of use to the inter­ested reader. It has been tested to work using OTRS v1, but should port to v2. Please note that the most recent ver­sions promise to offer a new sta­tis­ti­cal frame­work, built directly into OTRS.

Sta­tis­tics

The script itself is able to pro­duce the fol­low­ing ticket sta­tis­tics, aggre­gated on a per-day or per-week basis, between spec­i­fied dates and in spec­i­fied queues:

  • Num­ber of new tick­ets (total/calls/e-mails)
  • Work­load (num­ber of calls, in– and out­go­ing e-mails)
  • Effi­ciency (time calculations):
    • Open time
    • Reply time
    • Res­o­lu­tion time
  • Effi­cacy:
    • Aver­age num­ber of follow-ups/replies per ticket
    • Num­ber of first time fixes

The out­put can be selected as well; graphs, CSV and HTML (tables) are sup­ported. The pack­age can be obtained upon request.

Exam­ple

An exam­ple graph, cre­ated with this package:

workload

 

 

19/10/2011
  • *

    You may use these HTML tags: <a> <abbr> <acronym> <b> <blockquote> <cite> <code> <del> <em> <i> <q> <strike> <strong>

Go to Top