html to doc

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

html to doc

Post by codewriter »

Hi ,

I am working on converting html to doc thru java program. I have an html which is already fully formated . i got to export this as an doc file .any info on this would of help and if so some sample impl .

regards,
User avatar
floris v
Volunteer
Posts: 4431
Joined: Wed Nov 28, 2007 1:21 pm
Location: Netherlands

Re: html to doc

Post by floris v »

Hey, it's very bad practice not to tell what you want. :evil: Be specific about what you want done and how. Don't leave people guessing - that's wasting their time. They might say: "Open the file in Writer and save it as .doc." :lol:
OpenOffice 4.1.11 on Ubuntu; LibreOffice 6.4 on Linux Mint, LibreOffice 7.6.2.1 on Ubuntu
If your problem has been solved or your question has been answered, please edit the first post in this thread and add [Solved] to the title bar.
Nederlandstalig forum
hol.sten
Volunteer
Posts: 495
Joined: Mon Oct 08, 2007 1:31 am
Location: Hamburg, Germany

Re: html to doc

Post by hol.sten »

codewriter wrote:I am working on converting html to doc thru java program.
Try this threads:
- Java: Using the Bootstrap Connection Mechanism: http://user.services.openoffice.org/en/ ... =44&t=1013
- Java: Using the Interprocess Connection Mechanism: http://user.services.openoffice.org/en/ ... =44&t=1014

But use another save filter, like "MS WinWord 6.0", instead of "writer_pdf_Export". Change

Code: Select all

conversionProperties[0].Value = "writer_pdf_Export";
to

Code: Select all

conversionProperties[0].Value = "MS WinWord 6.0";
Or try the filter list, if you need another conversion: http://wiki.services.openoffice.org/wik ... st_OOo_2_1
OOo 3.2.0 on Ubuntu 10.04 • OOo 3.2.1 on Windows 7 64-bit and MS Windows XP
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

HI ,

I am almost there with the suggestion gave ... on converting html to doc ..
but life does not seems to be easy with this ...

here goes ....

Note :html2doc this is written in java program to run on a webserver .

If my from input url i.e loadurl is locally (hard disc) stored . the program is able to store properly .

but when the input url is an weburl i.e http://xyz.com . my program is not able to store .. wondering what is the problem .

will attach the code in the next post ... for check ..

any inputs are highly appreciated .

regards,
Last edited by codewriter on Sat Dec 29, 2007 7:45 pm, edited 2 times in total.
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

This works ...
public static void bootstrap() {


String loadUrl = "file:///c:/dev/netbeans/oootest/viewtopic.php.htm";
// String loadUrl = "http://www.google.com";
String storeUrl = "file:///c:/dev/netbeans/oootest/mydocoutputboot.doc";

try {
XComponentContext xContext = Bootstrap.bootstrap();
XMultiComponentFactory xMultiComponentFactory = xContext.getServiceManager();
XComponentLoader xcomponentloader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class, xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", xContext));

PropertyValue[] conversionProperties = new PropertyValue[2];
conversionProperties[0] = new PropertyValue();
conversionProperties[0].Name = "FilterName";
conversionProperties[0].Value = "MS Word 97";

conversionProperties[1] = new PropertyValue();
conversionProperties[1].Name = "Hidden";
conversionProperties[1].Value = new Boolean(true);

// Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadUrl, "_blank", 1, new PropertyValue[0]);
Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadUrl, "_blank", 1, conversionProperties);


XStorable xstorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, objectDocumentToStore);
// xstorable.storeToURL(storeUrl,conversionProperties);
xstorable.storeToURL(storeUrl, conversionProperties);
// Getting the method dispose() for closing the document
// XComponent xcomponent =
// ( XComponent ) UnoRuntime.queryInterface( XComponent.class,
// xstorable );
System.exit(0);
}
catch (java.lang.Exception e) {
e.printStackTrace();
}
finally {
System.exit(0);
}

}


and this does not ...


public static void bootstrap() {

String loadUrl = "http://www.xyz.com";
String storeUrl = "file:///c:/dev/netbeans/oootest/mydocoutputboot.doc";

try {
XComponentContext xContext = Bootstrap.bootstrap();
XMultiComponentFactory xMultiComponentFactory = xContext.getServiceManager();
XComponentLoader xcomponentloader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class, xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", xContext));

PropertyValue[] conversionProperties = new PropertyValue[2];
conversionProperties[0] = new PropertyValue();
conversionProperties[0].Name = "FilterName";
conversionProperties[0].Value = "MS Word 97";

conversionProperties[1] = new PropertyValue();
conversionProperties[1].Name = "Hidden";
conversionProperties[1].Value = new Boolean(true);

// Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadUrl, "_blank", 1, new PropertyValue[0]);
Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadUrl, "_blank", 1, conversionProperties);


XStorable xstorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, objectDocumentToStore);
// xstorable.storeToURL(storeUrl,conversionProperties);
xstorable.storeToURL(storeUrl, conversionProperties);
// Getting the method dispose() for closing the document
// XComponent xcomponent =
// ( XComponent ) UnoRuntime.queryInterface( XComponent.class,
// xstorable );
System.exit(0);
}
catch (java.lang.Exception e) {
e.printStackTrace();
}
finally {
System.exit(0);
}

}

Exception thrown ...
com.sun.star.task.ErrorCodeIOException:
at com.sun.star.lib.uno.environments.remote.Job.remoteUnoRequestRaisedException(Job.java:187)
at com.sun.star.lib.uno.environments.remote.Job.execute(Job.java:153)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:349)
at com.sun.star.lib.uno.environments.remote.JobQueue.enter(JobQueue.java:318)
at com.sun.star.lib.uno.environments.remote.JavaThreadPool.enter(JavaThreadPool.java:106)
at com.sun.star.lib.uno.bridges.java_remote.java_remote_bridge.sendRequest(java_remote_bridge.java:657)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.request(ProxyFactory.java:159)
at com.sun.star.lib.uno.bridges.java_remote.ProxyFactory$Handler.invoke(ProxyFactory.java:141)
at $Proxy5.storeToURL(Unknown Source)
at com.vtech.util.word.InterprocessConnectionOdtToPdfQuickAndDirty.bootstrap(InterprocessConnectionOdtToPdfQuickAndDirty.java:115)
at com.vtech.util.word.InterprocessConnectionOdtToPdfQuickAndDirty.main(InterprocessConnectionOdtToPdfQuickAndDirty.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:86)
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

HI ,

Has any one come across such situation mentioned above .

regards,
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

codewriter wrote:Hi ,

I am working on converting html to doc thru java program. I have an html which is already fully formated . i got to export this as an doc file .any info on this would of help and if so some sample impl .

regards,
User avatar
DrewJensen
Volunteer
Posts: 1734
Joined: Sat Oct 06, 2007 9:01 pm
Location: Cumberland, MD - USA

Re: html to doc

Post by DrewJensen »

OK - I don't use Java per se, but I do recall seeing a discussion before where the problem ended up being the hidden property - have you tried this with the document window not being hidden?
Former member of The Document Foundation
Former member of Apache OpenOffice PMC
LibreOffice on Ubuntu 18.04
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

Hi Thanks,

Yes i tried that too . But it doesn't seems to be working .

regards,
hol.sten
Volunteer
Posts: 495
Joined: Mon Oct 08, 2007 1:31 am
Location: Hamburg, Germany

Re: html to doc

Post by hol.sten »

codewriter wrote:and this does not ...

Code: Select all

...
    String loadUrl = "http://www.xyz.com";
...
            conversionProperties[0] = new PropertyValue();
            conversionProperties[0].Name = "FilterName";
            conversionProperties[0].Value = "MS Word 97";
...
Exception thrown ...
com.sun.star.task.ErrorCodeIOException:
...
Problem: Loading HTML with OOo creates a Writer/Web document and NOT a Writer document. For that reason, you cannot use the com.sun.star.text.TextDocument filters, you have to use the com.sun.star.text.WebDocument filters.

Here is an example of working Java code using the Bootstrap Connection Mechanism to load a HTML page and store it as HTML and PDF:

Code: Select all

package oootest;

import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.Bootstrap;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;

public class BootstrapConnectionWebToHtmlAndPdfQuickAndDirty {
    public static void main(String[] args) {
        
        // Loading HTML creates a Writer/Web document! It does NOT create a Writer document!
        String loadUrl      = "http://www.xyz.com";

        // Store web page as HTML and PDF
        String storeUrlHtml = "file:///c:/dev/netbeans/oootest/xyz.html";
        String storeUrlPdf  = "file:///c:/dev/netbeans/oootest/xyz.pdf";

        try {
            XComponentContext xContext = Bootstrap.bootstrap();
            XMultiComponentFactory xMultiComponentFactory = xContext.getServiceManager();
            XComponentLoader xcomponentloader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class,xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", xContext));

            Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadUrl, "_blank", 0, new PropertyValue[0]);

            // Sometimes loading from the web needs some time.
            // 4000 waits for 4 seconds. Try different settings.
            Thread.sleep(4000);

            XStorable xstorable = (XStorable) UnoRuntime.queryInterface(XStorable.class,objectDocumentToStore);

            // Filter names are listed at http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_2_1
            PropertyValue[] conversionProperties = new PropertyValue[1];
            conversionProperties[0] = new PropertyValue();
            conversionProperties[0].Name = "FilterName";

            // Store Writer/Web document as HTML
            conversionProperties[0].Value = "HTML";
            xstorable.storeToURL(storeUrlHtml,conversionProperties);

            // Store Writer/Web document as PDF
            conversionProperties[0].Value = "writer_web_pdf_Export";
            xstorable.storeToURL(storeUrlPdf,conversionProperties);
        }
        catch (java.lang.Exception e) {
            e.printStackTrace();
        }
        finally {
            System.exit(0);
        }
    }    
}
Thanks to add '[Solved]' in your first post title (edit button) if your issue has been fixed.

Regards
hol.sten
OOo 3.2.0 on Ubuntu 10.04 • OOo 3.2.1 on Windows 7 64-bit and MS Windows XP
hol.sten
Volunteer
Posts: 495
Joined: Mon Oct 08, 2007 1:31 am
Location: Hamburg, Germany

Re: html to doc

Post by hol.sten »

Here is an example of working Java code using the Bootstrap Connection Mechanism to load a HTML document, an ODT document and an ODS document and store each as a PDF:

Code: Select all

package oootest;

import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.Bootstrap;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XStorable;
import com.sun.star.io.IOException;
import com.sun.star.lang.IllegalArgumentException;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.lang.XServiceInfo;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;

public class BootstrapConnectionDocumentToPdfQuickAndDirty {
    public static void main(String[] args) {
        // Load documents
        String loadUrlHtml = "file:///c:/dev/netbeans/oootest/my.htm";
        String loadUrlOdt  = "file:///c:/dev/netbeans/oootest/my.odt";
        String loadUrlOds  = "file:///c:/dev/netbeans/oootest/my.ods";

        // Store documents
        String storeUrlHtml = "file:///c:/dev/netbeans/oootest/my.htm.pdf";
        String storeUrlOdt  = "file:///c:/dev/netbeans/oootest/my.odt.pdf";
        String storeUrlOds  = "file:///c:/dev/netbeans/oootest/my.ods.pdf";

        try {
            XComponentContext xContext = Bootstrap.bootstrap();
            XMultiComponentFactory xMultiComponentFactory = xContext.getServiceManager();
            XComponentLoader xcomponentloader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class,xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", xContext));

            convertDocumentToPdf(xcomponentloader,loadUrlHtml,storeUrlHtml);
            convertDocumentToPdf(xcomponentloader,loadUrlOdt,storeUrlOdt);
            convertDocumentToPdf(xcomponentloader,loadUrlOds,storeUrlOds);
        }
        catch (java.lang.Exception e) {
            e.printStackTrace();
        }
        finally {
            System.exit(0);
        }
    }

    private static void convertDocumentToPdf(XComponentLoader xcomponentloader, String loadUrlHtml, String storeUrl) throws IOException, InterruptedException, IllegalArgumentException {
        Object document = xcomponentloader.loadComponentFromURL(loadUrlHtml, "_blank", 0, new PropertyValue[0]);

        // Sometimes loading needs some time. 4000 waits for 4 seconds. Try different settings if needed.
        Thread.sleep(4000);

        storePDF(document, storeUrl);
    }

    private static void storePDF(Object document,String storeUrl) throws IOException {
        // Determine suitable filter name for PDF export by asking XServiceInfo.
        // Source: OOo Developer's Guide - 7 Office Development - 7.1.5 Handling Documents - Storing Documents
        // http://api.openoffice.org/docs/DevelopersGuide/OfficeDev/OfficeDev.xhtml#1_1_5_3_Storing_Documents
        //
        // Filter names are listed at http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_2_1
        XServiceInfo xInfo = (XServiceInfo) UnoRuntime.queryInterface(XServiceInfo.class,document);
        String storeFilter = null;
        if(xInfo!=null) {
            if(xInfo.supportsService("com.sun.star.text.TextDocument")) {
              storeFilter = "writer_pdf_Export";
            }
            else if(xInfo.supportsService("com.sun.star.text.WebDocument")) {
              storeFilter = "writer_web_pdf_Export";
            }
            else if(xInfo.supportsService("com.sun.star.sheet.SpreadsheetDocument")) {
              storeFilter = "calc_pdf_Export";
            }
        }

        if (storeFilter != null) {
            PropertyValue[] conversionProperties = new PropertyValue[2];
            conversionProperties[0] = new PropertyValue();
            conversionProperties[0].Name = "FilterName";
            conversionProperties[0].Value = storeFilter;
            conversionProperties[1] = new PropertyValue();
            conversionProperties[1].Name = "Overwrite ";
            conversionProperties[1].Value = new Boolean(true);

            XStorable xstorable = (XStorable) UnoRuntime.queryInterface(XStorable.class,document);
            xstorable.storeToURL(storeUrl, conversionProperties);
        }
    }
}
Regards
hol.sten
OOo 3.2.0 on Ubuntu 10.04 • OOo 3.2.1 on Windows 7 64-bit and MS Windows XP
User avatar
DrewJensen
Volunteer
Posts: 1734
Joined: Sat Oct 06, 2007 9:01 pm
Location: Cumberland, MD - USA

Re: html to doc

Post by DrewJensen »

Hol.stein,

Well, I am sure the OP will appreciate that example and I know I do - this will help with a project I have on the drawing board immensely. Thanks.
Former member of The Document Foundation
Former member of Apache OpenOffice PMC
LibreOffice on Ubuntu 18.04
codewriter
Posts: 7
Joined: Wed Dec 26, 2007 2:14 pm

Re: html to doc

Post by codewriter »

Hi Thanks very much,

Now I have used writer_web_StarOffice_XML_Writer to store it as .doc file .

I still have one issue , where the images are not seems to be embeded into the doc file while it loads from the weburl . Can i embed the avilable images from html page into the generated doc file some how .
the reason is , this generated output file will be used offline also , where there is no internet connections available .

regards,
sarath
Posts: 10
Joined: Tue Oct 27, 2009 5:50 pm

Re: html to doc

Post by sarath »

GOOO 3.1.1 on Windows XP
kunal14
Posts: 1
Joined: Wed Dec 30, 2015 2:58 pm

Re: html to doc

Post by kunal14 »

if html is in string format like "<html>...</html>", then how to convert it to doc
OpenOffice 4.1.2 on windows 7
Post Reply