So I was looking for a tool to convert between various document formats (ie .tex to .doc, .html, .pdf, .odt or any variation thereof) and I found that it was surprisingly difficult finding a tool to convert to .odt. Converting from pretty much anything to either html or pdf was a breeze, however to other formats was more difficult. To convert from .tex to .html, I chose to use latex2html rather than tex4ht because I found the output to be cleaner. Conversion from .tex to .pdf was quite a bit easier. There were a few options. I could either run latex, dvips, and ps2pdf; or latex and dvipdfm; or just pdflatex. These are the standard options. For some reason, when I used these methods the hyperlink table of contents that I created (via hyperref and \tableofcontents) wouldn't show up. I instead chose to use rubber instead. Rubber is a wrapper for LaTeX and some companion programs. With the '-d' option, I was able to make PDFs out of the .tex files that included they hyperlinked table of contents. I don't really understand why this worked and the other methods didn't but it did. Now was the hard part. Converting the .tex files into .odt or into .doc appeared to be near impossible to do cleanly. The best option I had heard was to convert first into HTML and then load it into Office and then save as the desired formats. I found this to work out extremely well. However, I had intended to automate the whole process of document conversion with a script, so this method was not very good for me especially since neither Microsoft Office nor Openoffice.org had very good command line interfaces. This was when I discovered a program called JODConverter. This is a Java program that utilizes Openoffice to convert from one format to another. While this does mean that I probably could have found a way to use Openoffice directly via command line, who was I to complain when there was a program out there to do it for me =D. In the end I wrote a small BASH script to help me with the conversions.
NOTES:
-written in BASH because I'm using Arch Linux
-uses zenity to provide a GUI but is not really necessary
-while this is tailored to my LaTeX usage it can probably be adapted for anything, although just using JODConverter is probably better if LaTeX is not an issue
#!/bin/bash
cd /home/ray/Documents/Novel/Tex/
response=$(zenity --list --title="Choose File" --column=File \
$(ls --hide=*.pdf --hide=*.odt --hide=*.html --hide=Revisions --hide=Output /home/ray/Documents/Novel/Tex/) )
input=$(zenity --list --title="Choose File Type" --checklist --column=Files --column=Description \
TRUE PDF \
TRUE HTML \
TRUE ODT \
TRUE DOC)
cd /home/ray/Documents/Novel/Tex/
IFS='|' ; for word in $input ; do
case $word in
PDF) rubber -f -s -d /home/ray/Documents/Novel/Tex/$response;;
HTML) latex2html -split 0 -no_navigation -dir /home/ray/Documents/Novel/Tex/ $response
sed s_${response%.*x}.html#_#_ <${response%.*x}.html>${response%.*x}.html.new
rm index.html
rm ${response%.*x}.html
mv ${response%.*x}.html.new ${response%.*x}.html;;
ODT)latex2html -split 0 -no_navigation -dir /home/ray/Documents/Novel/Tex/ $response
sed s_${response%.*x}.html#_#_ <${response%.*x}.html>${response%.*x}.html.new
rm index.html
rm ${response%.*x}.html
mv ${response%.*x}.html.new ${response%.*x}.html
soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
jodconverter /home/ray/Documents/Novel/Tex/${response%.*x}.html /home/ray/Documents/Novel/Tex/${response%.*x}.odt
pkill soffice;;
DOC)latex2html -split 0 -no_navigation -dir /home/ray/Documents/Novel/Tex/ $response
sed s_${response%.*x}.html#_#_ <${response%.*x}.html>${response%.*x}.html.new
rm index.html
rm ${response%.*x}.html
mv ${response%.*x}.html.new ${response%.*x}.html
soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
jodconverter /home/ray/Documents/Novel/Tex/${response%.*x}.html /home/ray/Documents/Novel/Tex/${response%.*x}.doc
pkill soffice;;
esac
unset IFS
cd /home/ray/Documents/Novel/Tex/
rm $(ls --hide=*.tex --hide=*.sh --hide=*.html --hide=*.odt --hide=*.pdf --hide=*.doc --hide=Output --hide=Revisions)
mkdir Output/$(date +%F-%R)
cp $(ls --hide=*.tex --hide=Revisions --hide=Output) /home/ray/Documents/Novel/Tex/Output/$(date +%F-%R)/
#
#echo "-------------------- EXTRA STEPS --------------------"
#echo "1. Open HTML with OpenOffice.org Writer"
#echo "2. Add first-line indent"
#echo "3. Save file as Master.odt"
#echo "4. Export Master.odt to GoogleDocs"
#echo "-----------------------------------------------------"
#cd ~
done
Showing posts with label LaTeX. Show all posts
Showing posts with label LaTeX. Show all posts
Wednesday, September 22, 2010
Monday, July 12, 2010
Update
Hello my non-existent readers. I would just like to give everyone an update. I've changed the layout of the blog a bit. I've added a page where you can access the completed story (well completed so far). I will keep the link to the GoogleDocs version of the story in case you wish to download/edit/distribute the story.
Next section of the novel is forthcoming.
Warning: Techy stuff ahead
If you've read my previous post: A Note on Writing Software, you'll know that I am currently using LaTeX to write my story. There are many tools that I use. The three main programs that I use are pdflatex (makes PDFs), oolatex (makes ODT [OpenDocumentText file]), and latex2html (makes a HTML file). The ease in which I can convert into all of these filetypes is one reason why I use LaTeX. Recently though, I discovered something. For those of you who know, oolatex is a part of Tex4ht, which is itself capable of making an HTML file via the htlatex command. Now I have found that when copy/pasting the html produced by the htlatex command into Blogspot, the paragraphs get all messed up. This is why I turned to latex2html. The only caveat with latex2html is that the Table of Contents hyperlinks do not work (keyed to file). This problem was quickly solved by a few sed commands replacing part of the HTML code that tied the links to the file. Anyways I wrote a little bash script so that I can easily convert my .tex file into .odt, .pdf, and .html all in one go. Here is said script
#!/bin/bash
cd /home/ray/Documents/Novel/Tex/
sed "s_\\\usepackage_%\\\usepackage_"Master.tex.new
sed "s_\\\hypersetup_%\\\hypersetup_"Master.tex
mk4ht oolatex Master.tex
sed "s_%\\\usepackage_\\\usepackage_"Master.tex.new
sed "s_%\\\hypersetup_\\\hypersetup_"Master.tex
pdflatex Master.tex
#htlatex Master.tex
latex2html -split 0 -no_navigation -dir /home/ray/Documents/Novel/Tex/ Master.tex
sed 's_Master.html#_#_'Master.html.new
rm index.html
rm Master.html
mv Master.html.new Master.html
rm $(ls --hide=*.tex --hide=*.sh --hide=*.html --hide=*.odt --hide=*.pdf)
cd ~
Now there's probably an easier way to do all of this, and I welcome suggestions, but this is the one I've settled upon for now.
Next section of the novel is forthcoming.
Warning: Techy stuff ahead
If you've read my previous post: A Note on Writing Software, you'll know that I am currently using LaTeX to write my story. There are many tools that I use. The three main programs that I use are pdflatex (makes PDFs), oolatex (makes ODT [OpenDocumentText file]), and latex2html (makes a HTML file). The ease in which I can convert into all of these filetypes is one reason why I use LaTeX. Recently though, I discovered something. For those of you who know, oolatex is a part of Tex4ht, which is itself capable of making an HTML file via the htlatex command. Now I have found that when copy/pasting the html produced by the htlatex command into Blogspot, the paragraphs get all messed up. This is why I turned to latex2html. The only caveat with latex2html is that the Table of Contents hyperlinks do not work (keyed to file). This problem was quickly solved by a few sed commands replacing part of the HTML code that tied the links to the file. Anyways I wrote a little bash script so that I can easily convert my .tex file into .odt, .pdf, and .html all in one go. Here is said script
#!/bin/bash
cd /home/ray/Documents/Novel/Tex/
sed "s_\\\usepackage_%\\\usepackage_"
sed "s_\\\hypersetup_%\\\hypersetup_"
mk4ht oolatex Master.tex
sed "s_%\\\usepackage_\\\usepackage_"
sed "s_%\\\hypersetup_\\\hypersetup_"
pdflatex Master.tex
#htlatex Master.tex
latex2html -split 0 -no_navigation -dir /home/ray/Documents/Novel/Tex/ Master.tex
sed 's_Master.html#_#_'
rm index.html
rm Master.html
mv Master.html.new Master.html
rm $(ls --hide=*.tex --hide=*.sh --hide=*.html --hide=*.odt --hide=*.pdf)
cd ~
Now there's probably an easier way to do all of this, and I welcome suggestions, but this is the one I've settled upon for now.
Saturday, June 19, 2010
A Note on Writing Software
Recently, I've begun exploring alternative programs in which to use as a word processor. I stumbled upon an area of thought known as WYSIWYM. This acronym stands for What You See Is What You Mean. This is different from the more popular WYSIWYG (What You See Is What You Get) that is used in most writing tools today. Basically, WYSIWYM, from my understanding, is when you focus on writing rather than format. No need to worry about indents or spacing, or font or any of that hooey.
From this I began to learn the LaTeX document markup language. Learning the basics did not take long at all. In practice I found it simple to use and implement and the results weren't bad. Once LaTeX compiled the file the output it did indeed look like an actual book. It even used roman numerals to number the prologue and then normal numbers for subsequent pages(something I find is a pain to do in Word/ OpenOffice Writer). If I were to publish a paper, or even this novel I would really consider using this tool. The file sizes are small, adding sections, table of contents, title pages, bibliographies are all easy.
Of course, I don't want to paint a completely rosy picture. There are indeed disadvantages. Firstly, you do need to know the lingo and, unless you're using a program like Lys, you do need to compile the files which is an extra step. Also it can be said that what can be accomplished by the LaTeX language, at least in the area of page/section layouts, indents etc. can be done through a template in Word or whatever word processor you are using. In addition to this I found that converting from .tex format to a more typical document processing format (ex. .doc, .odt, .rtf)
Despite all of this I do believe I will endure. Perks like automatic section numbering, title page generation, and table of contents are hard to pass up.
Note: I made a few changes to the Prologue, mostly grammatical, a few new lines of text. Chapter 1 will be forth coming.
Edit: Scratch the comment about converting... it was just me being stupid.
From this I began to learn the LaTeX document markup language. Learning the basics did not take long at all. In practice I found it simple to use and implement and the results weren't bad. Once LaTeX compiled the file the output it did indeed look like an actual book. It even used roman numerals to number the prologue and then normal numbers for subsequent pages(something I find is a pain to do in Word/ OpenOffice Writer). If I were to publish a paper, or even this novel I would really consider using this tool. The file sizes are small, adding sections, table of contents, title pages, bibliographies are all easy.
Of course, I don't want to paint a completely rosy picture. There are indeed disadvantages. Firstly, you do need to know the lingo and, unless you're using a program like Lys, you do need to compile the files which is an extra step. Also it can be said that what can be accomplished by the LaTeX language, at least in the area of page/section layouts, indents etc. can be done through a template in Word or whatever word processor you are using. In addition to this I found that converting from .tex format to a more typical document processing format (ex. .doc, .odt, .rtf)
Despite all of this I do believe I will endure. Perks like automatic section numbering, title page generation, and table of contents are hard to pass up.
Note: I made a few changes to the Prologue, mostly grammatical, a few new lines of text. Chapter 1 will be forth coming.
Edit: Scratch the comment about converting... it was just me being stupid.
Subscribe to:
Posts (Atom)