Postscript to Text converter

Chris Lewis clewis at ferret.ocunix.on.ca
Tue May 28 10:59:38 AEST 1991


In article <9105262212.AA29690 at ucbvax.Berkeley.EDU> gtoal at tardis.computer-science.edinburgh.ac.uk writes:
>In article <1991May26.181915.14910 at elroy.jpl.nasa.gov> mathew at jane.Jpl.Nasa.Gov (Mathew Yeates) writes:
>>In article <1991May26.063129.26177 at netcom.COM> nagar at netcom.COM ( Nagar) writes:
>>>I am looking for a postscript to
>>>text converter, is there such a
>>>program available through
>>>ftp from simtel20 or some other
>>>site?

>This is going to sound silly, but the best way of getting what you
>want is to print out your postscript and scan it back in!

If you have a scan-2-text converter rather than simply a raster reader.

>If you're a real hacker, get the Ghostscript sources and hack them
>to output any text to a data structure instead of the bitmap, and
>do an x-y sort on your data structure.  Modulo superscripts and
>subscripts, you might have a chance of reconstructing lines.

You can do this without Ghostscript.  I've taken the output of
various text processors and reconstructed an ASCII version using
perl (this is also doable in awk).  You need to search for the (x,y)
coordinate settings, and translate these into row and column positions,
and then "drop" the strings enclosed in parenthesis at that position.

Hard things are if the postscript contains reverse line motion
(which requires you to buffer a whole page).  Or, if the point sizes
vary a lot. Of course, this approach won't handle graphics and other
stuff, but as long as your scanner is reasonably accurate in only
snagging x:y and text display commands, it'll work well enough.

If you're familiar with awk or perl, you can usually whomp one of these
things up in about an hour.  Sorry I didn't save the one I did for someone
else on the net.
-- 
Chris Lewis, Phone: (613) 832-0541, Domain: clewis at ferret.ocunix.on.ca
UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List:
ferret-request at eci386; Psroff (not Adobe Transcript) enquiries:
psroff-request at eci386 or Canada 416-832-0541.  Psroff 3.0 in c.s.u soon!



More information about the Alt.sources.d mailing list