Word doc search
This article is considered of unknown usefulness and may be a candidate for deletion. If you want to revive discussion regarding the subject, you may try using the talk page or start a discussion at Meta:Babel. |
- Apache
- Mandrakelinux
I modified the standard /includes/SpecialUpload.php page so that uploaded Microsoft Word documents with the .doc extension will have their contents indexable.
I started by downloading and installing the Antiword tool (I installed it from this RPM. It is a Linux command line utility that will convert a Word doc's text to ASCII and output it.
Then I modified SpecialUpload.php where it tests for a successful upload and just before it inserts the uploaded file information into the database. What this does is make the text of the word document (including 'hidden' text - that's the -s parameter of antiword) an HTML comment block in the description text of the image's file page.
A user must change their preferences to search Images to be able to search the image's page.
if( $this->saveUploadedFile( $this->mUploadSaveName, $this->mUploadTempName, !empty( $this->mSessionKey ) ) ) { /** * Update the upload log and create the description page * if it's a new file. */ # MHART replace $textdesc with <!-- text from doc if .d if (strtolower($finalExt) == "doc") { $NewDesc = $this->mUploadDescription . "\r\n" . "<!-- "; $toexec = "/usr/bin/antiword -s " . $this->mSavedFile; exec($toexec, $DocText); foreach ($DocText as $DocLine) { $NewDesc .= "\r\n" . str_replace("-->","",$DocLine); } $NewDesc .= "\r\n" . " -->"; } else $NewDesc = $this->mUploadDescription; #### wfRecordUpload( $this->mUploadSaveName, $this->mUploadOldVersion, $this->mUploadSize, $NewDesc, # MHART - this line has been changed $this->mUploadCopyStatus, $this->mUploadSource ); $this->showSuccess(); }
My actual script is a bit different - because I'm handling other file types in similar fashion. Here's the combined documentation.
--MHart 17:54, 23 Apr 2005 (UTC)