About Me

My photo
PhD Candidate at Purdue University, Computer Science.

Tuesday, November 06, 2007

Configure libtiff for Visual C

I had to move my working environment to windows to realize integration with my organization. So, one of the steps was to configure libtiff to work on Visual C.

To use LIBTIFF in your existing VC++ 6.0 Project/WorkSpace, you need to do the following setting modifications:

  1. Open your existing Project/WorkSpace file in VC++ 6.0
  2. Choose Project-Settings... menu item.
  3. Click "C/C++" tab.
  4. Choose "Preprocessor" item in the "Category:" menu.
  5. Choose "All Configurations" item in the "Settings for" menu.
  6. Add to the "Additional Include Directories:" field the path to the include directory of the libtiff folder.
  7. Choose "Code Generation" item from the "Category" menu
  8. Choose "Win32 Release" item in the "Settings for:" menu
  9. Select "Multithreaded" option in the "Use run-time library:" menu.
  10. Select "Win32 Debug" item in the "Settings for:" menu
  11. Set "Use run-time library" menu to the option "Debug Multithreaded".
  12. Click "Link" tab in the dialog.
  13. Select "Input" item in the "Category:" menu.
  14. Choose "All Configurations" item in the "Settings for:" menu
  15. Add to the "Additional library path:" the path to the lib folder in the libtiff folder.
  16. Choose "Win32 Release" item in the "Settings for:" menu
  17. Add to the "Object/library modules:" field: libtiff.lib
  18. Choose "Win32 Debug" item in the "Settings for:" menu
  19. Add following text to the "Object/library modules:" field: dlibtiff.lib.
  20. Click "Ok".

Saturday, November 03, 2007

Installing Windows, Ubuntu7.10, Mac OS 10.4.10 on MacBook (Triple Boot)

I am happy with my MacBook with its running Mac OS X 10.4.10. I can't stand working on windows any more, I feel more comfortable with Mac OS.

My master Thesis is running on linux environment(basically Fedora, but I successfully turned it to be Ubuntu)
My current project delivery should be done on windows. An easy solution costing 60$ would be to purchase Fusion.
Although Fusion looks very interesting, I didn't submit to this solution, for the following reasons:
  1. Running virtual machines consumes more memory, in this case I will loose some performance which is something I will certainly need during development.
  2. More memory usage, means more power consumptions, hence less battery life time. in normal cases I enjoy having ~4hrs battery life time with my lovely Mac. this is sthg i don't stand to loose.
  3. to overcome the first problem, I can extend my RAM. this will make the virtual software costs me almost (60+90)$....I really can't afford this for now. (I didn't get paid for salary 3 months ago)
So, I decided to to create a triple boot on my MacBook. It was a very risky step for me..But here we go, I have nothing to loose anyway (keeping in mind 150$ :S, 900 LE when converted to our local currency!!!!!! )

My MAC specs are:
Processor: 2.16GHz Intel Core 2 Duo
Memory: 1 GB 667 MHz DDR2 SDRAM
MAC OS X: 10.4.10

The target is to install Ubuntu 7.10 and WinXP SP2 on MacBook.
  1. Get BootCamp: I think it may force you to update to Mac 10.5. (luckily I installed it 1 month ago before leopard is released)
  2. Update your Mac OS.
  3. Install rEFIT.
  4. run BootCamp assistant and follow instructions to burn driver CD for windows. (don't proceed with installing steps)
  5. Backup your data. (you may not loose your data if things go smoothly)
  6. Check your disk partitions and identify the Mac Partion. I most cases it is /dev/disk0s2. but if you are not sure, you can verify this by running a shell command using the diskutil:
    $ diskutil list
    resize your HDD using Diskutil by running the following command. first you specify the volume to be resized, and its new size, then the type, and the name of the new volumes followed by their size.
    $ diskutil resizeVolume /dev/disk0s2 70G "Linux" "Linux" 20G "MS-DOS FAT32" "Windows" 20G
  7. insert your XP SP2 CD and hold down the "ALT" key.
  8. install XP on the valid partition, just give it a quick FAT32 format.
  9. you should now have a dual boot(windows with Mac).
  10. insert your Ubuntu 7.10 Live CD.
  11. run the installation normally. You should set up the partition manually. Don't mount the EFI system partition. you need only to mount / to the drive you allocated to your linux installation. I didn't make a SWAP file, I just don't need this for now, I relied on my 1GB RAM.
  12. Continue through the following steps.
  13. When you reboot, you should have triple boot.

Extracting metadata from PDFs

I have assigned to create a DB starting from PDF files. Typical fields are (title, Location, Date, Actors mentioned in each PDF..etc)
plan A:
The first idea i got in my mind is to create a Lexical Analyser.
I should parse the PDF file word by word then check if it is "Held" then i should expect a location description after that...If I find "Present" then i should expect a bunch of actors' names.

TO parse Text of PDF file I used PDFBox. Surprisingly, it didn't work as i expected.
  1. the PDFParse can split a single word into multiple words. So my code would receive "hel" then "d", and not "held". In this case to make it works I should generate a state machine that keeps track of the history which I found non feasible solution.
  2. I have French documents among these files. Applying a text analyser for french document is a kind of stupidity, in french..instead of saying "hold on", they type "le". Le is a very frequent string (if u r not familiar with french, it is equivalent to the".
plan B:
which implies that putting positions of text in consideration, hence I can build a set of blocks, and based on this structure I can isolate different logical blocks and then separate their contents.
this didn't work :( the problem this time was with the OCR generating these PDFs. the OCR is not accurate with text positioning 100%. hence the input to my code is buggy. in consequence, the OCR merges two different columns, or split 1 column to different columns.

plan C:
Implies to work on the TIFF files directly(ignores the Buggy PDF)
I read the original TIFF file, then parse pixel by pixel to split the original tiff into sub tiff. in this way I have control on the blocks without need to let the OCR do this for me.
after identifying basic blocks in the tiff file, I call an OCR SDK to transform my Tiffs into text, which finally will be my fields in the DB.

Friday, November 02, 2007

Joke of the year

While I was working on my MAC...I was very concentrating and stressed...
I was trying to Burn A CD....Well, it was my first time to burn a CD on my MacBook...

I tried inserting the CDs for 20 minutes(i tried 10 blank CDs)..but Nothing happened, so i decided that i may have corrupted CD blanks and I should buy some new ones.
Luckily, I noticed that all that time I was inserting the blank CD in my DESKTOP DVD , not my MacBook... :D
I should lost my mind somewhere...