+ Reply to Thread
Results 1 to 26 of 26

Getting Web page source (HTML or XML) in VBA?

  1. #1
    Registered User
    Join Date
    10-27-2006
    Posts
    18

    Getting Web page source (HTML or XML) in VBA?

    I would like to retrieve contents of a web page, be it HTML or XML, into VBA variable!

    Later, I would chop, cut, parse or extract the data I need.

    Both importing as XML or WebQueries is unsatisfactory for a certain number of pages I need. XML has bad schema, WebQuery tells me it can't find any data.

    I tried with WinHTTPRequest, but Excel gives me back error "undefined user type" in other words it doesn't recognize that object.

    Basically I want the source of web page to become a string in my VBA code. In other words that would be replication of funcionallity of
    Please Login or Register  to view this content.
    which is provided by AutoHotKey scripting language (if anyone used it, very simple and effective for many things).

    Can it be done in Excel's VBA?

  2. #2
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    I get no takers on this one?

    Come on, this is chalenging, it's not like "How do I display a message box before deleting something?" or "How do I make cell A4 red?". Not that I don't think these don't deserve replies, they just have standard answers.

  3. #3
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    I've done some related stuff ,but I never found a way to download the source file.

    I did end up of having success downloading tables where I could reference individual elements in tables , count how many tables were on a page etc.

    Prior to that I was also seeking to download the source file, but abandoned that route once I able to selectively pick my data directly off the webpage.

    If you think it can be any help I can probably dig up an example.

  4. #4
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    Thanks SuitedAces (are they red or black?)!

    The very reason I'm looking to download source is because using WebQuery to retrieve data fails to retrieve data from pages I need. I don't know the reason why is that so.

    Example:
    http://www.wagerline.com/Scores-Odds...ontID=9361&t=0

    Site displaying various betting and sports info. In this particular example we have a boxscore which has perfectly formatted HTML as source.

    You can verify that tables are really present by looking at HTML source or by using WebDeveloper add-on for Firefox which can display information on just about any part of the web page! This particular page has 5 of them: login box, two team statistics, technical fouls and team statistics.

    Why Excel's WebQuery can't pick any of them - I have no idea (it gives something like "connection is there, but there are no data"). That's why I need the source to parse out tables myself.

  5. #5
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Quote Originally Posted by Riorin
    Thanks SuitedAces (are they red or black?)!

    The very reason I'm looking to download source is because using WebQuery to retrieve data fails to retrieve data from pages I need. I don't know the reason why is that so.

    Example:
    http://www.wagerline.com/Scores-Odds...ontID=9361&t=0

    Site displaying various betting and sports info. In this particular example we have a boxscore which has perfectly formatted HTML as source.

    You can verify that tables are really present by looking at HTML source or by using WebDeveloper add-on for Firefox which can display information on just about any part of the web page! This particular page has 5 of them: login box, two team statistics, technical fouls and team statistics.

    Why Excel's WebQuery can't pick any of them - I have no idea (it gives something like "connection is there, but there are no data"). That's why I need the source to parse out tables myself.
    I don't understand the question , red or black ?
    Please explain , it's probably an expression I'm not familiar with.

    I encountered identical problems with WebQuery , it's very limited and problematic and my first thought was the same as yours ...download the source code and parse it.
    I haven't been able to find a way to do that .

    But hacking my way through the properties of the html doc (I don't know a damn thing about html) I was able to write procedures that enabled me to extract everything I need.

    In fact I believe I have a closely related example to yours...downloading hockey scores.
    But with that link you gave me, that might be a whole different animal because I wasn't able to look at the source file from the menu.
    Last edited by SuitedAces; 01-20-2008 at 03:53 PM.

  6. #6
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Ok, check that , I WAS able to open the source file the second time I tried.
    That was strange.

  7. #7
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    Ok, check that , I WAS able to open the source file the second time I tried.
    How did you managed that?

    because I wasn't able to look at the source file from the menu.
    Mozilla Firefox (unlike IE) with its extensions or add-ons (WebDeveloper, FireBug, View Source Chart, TableTools etc.) is your friend. You'll find anything you ever wanted to know or extract from a particular web page and much more.

    I don't understand the question , red or black ?
    I thought your screenname came from poker terms, "suited connectors" 76s meaning 7 and 6 of the same suit like spades for example!
    Aces being cards of the same rank cannot in fact be suited, you can only have two red (hearts and diamonds) or two black (clubs and spades) aces in hole cards.

    But same terms have different meanings for different people. You might have meant a "dressed up (suited) champion in his field (Ace)"?

  8. #8
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Quote Originally Posted by Riorin
    I thought your screenname came from poker terms, "suited connectors" 76s meaning 7 and 6 of the same suit like spades for example!
    Aces being cards of the same rank cannot in fact be suited, you can only have two red (hearts and diamonds) or two black (clubs and spades) aces in hole cards.
    LOL Ok I guess my mind is slow today.

    Here's an example file.
    The procedure will navigate to the number of pages specified by the StartDate and EndDate, I have it set at 3 dates (It can download the entire season).
    The code is rough draft but it's functional for an example.
    Attached Files Attached Files
    Last edited by SuitedAces; 01-20-2008 at 04:43 PM.

  9. #9
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    Good stuff, SuitedAces!

    I guess if you can instantiate IE object you should be able to instantiate WinHTTPRequest object, however I fail at that!

    The simplest solution that I found to actualy getting "your hands on the source of web page" is this (in pseudocode):
    Please Login or Register  to view this content.
    Haven't implemented it yet, but it should work.

    I was just hoping for a neat all-Excel solution!

  10. #10
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    To be honest I'm not aware of what a WinHTTPRequest object is.

    As far as the source code goes, at this point I'm not convinced that parsing it would be any less time consuming than working with the HTML Document Object .
    But I think if you go that route you could simplify it by just copying to the clipboard.
    Because I know I have seen code to paste the contents of the clipboard in VBA , if you can get the sourcecode to the clipboard I can find you code to then just paste the clipboard .
    This way you can eliminate the use of an external file.
    Last edited by SuitedAces; 01-20-2008 at 06:10 PM.

  11. #11

  12. #12
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    As far as the source code goes, at this point I'm not convinced that parsing it would be any less time consuming than working with the HTML Document Object .
    It wouldn't, parsing it would be certainly more time-consuming. I just don't know how to access HTML DOM programtically.

  13. #13
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Give me a liitle time and I will put together a sub that will demonstrate the htlm doc properties that I am familiar with .
    With these I was able to write the procedure in the example I uploaded.
    Also I will find some links for the HTML Document Object Model (HTML DOM) reference.

  14. #14
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Riorin here is a link to the HTML reference that I was using to determine what properties I could get from the html doc.
    http://msdn2.microsoft.com/en-us/lib...9(VS.85).aspx#

    I attached a procedure that I am working on to break Html docs down into elements after downloading.

    Placing the URL in B2 on the sheet will not work.( bugs)
    So for now I have the URL of the webpage that you gave me hardcoded in the procedure (you'll see it in the sheet module)

    So just run it and see that what it does is break out the htlm doc into various properties.
    Those are the properties that I used for downloading the scores in the other workbook I uploaded.

    The idea being that you look down the worksheet and match those elements to the webpage.

    From there you can start to write a procedure that extracts the information you need from the html doc.


    You aren't limited to the properties you see me using .
    But those are some of the properties I was able find success with.

    It might the case that not all of the properties that are contained in the HTLM reference are available in Excel.

    What I did was go through the reference and play around with the properties of the html doc by trial and error.
    Attached Files Attached Files
    Last edited by SuitedAces; 01-21-2008 at 01:26 PM.

  15. #15
    Registered User
    Join Date
    10-27-2006
    Posts
    18
    Great stuff SuitedAces. Thanks a ton!

  16. #16
    Forum Contributor
    Join Date
    06-27-2006
    Posts
    310
    Also be aware of properies like length which will give you counts of collection.

    If you break out the elements from the webpage that the hockey download uses and then compare that to the finished code you will get a very good idea on how you can use the properties of the html doc.

    I had a problem downloading the second file I gave you when setting IE.visible = false , so there still are a few minor bugs in that code.
    Last edited by SuitedAces; 01-21-2008 at 01:37 PM.

  17. #17
    Registered User
    Join Date
    12-17-2008
    Location
    Bethesda, MD
    Posts
    1
    Yes, very nice stuff, thank you! I was looking for the solution to the same problem (grabbing an html page source code since web query won't work in the page that I want to parse), Almighty Google directed me to this page, and I register just to take a look at your code and found something amazing.

  18. #18
    Registered User
    Join Date
    02-28-2009
    Location
    sdfsdf fsdf
    MS-Off Ver
    Excel 2003
    Posts
    1

    Re: Getting Web page source (HTML or XML) in VBA?

    This stuff is just what I was looking for but i can't seem to download the attachments. Can someone advise. Thanks

  19. #19
    Registered User
    Join Date
    03-18-2009
    Location
    Australia
    MS-Off Ver
    Excel 2003
    Posts
    2

    Re: Getting Web page source (HTML or XML) in VBA?

    I am trying to automate a Web query that pastes the data on a spreadsheet.
    It seems I am 95% there however yet an inch is as good as a mile.

    I need to step thru several web pages. I have a cell on the worksheet that creates the new wed address but a just need a command that will use the web address on the worksheet.

    As an example the cell P10 formula = http://www.thenewwebaddress.com./1950.xml

    Every new web address changes the _195?.xml to say _1951.xml then 1952.xml etc.

    I'm using this code I found on this site.

    _____________________
    Const MyUrl As String = Activesheet.cells("P10").String <<problem child

    With ActiveSheet.QueryTables.Add(Connection:= _
    "URL;" & MyUrl, Destination:=Cells(1, 1))
    _____________________

    Can anyone advise the code I need to use to grab the changed wed address from cell P10 on the worksheet?

    Thanks

  20. #20
    Registered User
    Join Date
    03-18-2009
    Location
    Australia
    MS-Off Ver
    Excel 2003
    Posts
    2

    Re: Getting Web page source (HTML or XML) in VBA?

    Making progress however every time I run the query again it pastes the new date X rows to the right. Is there a true/false I need to change?

    .FieldNames = True
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = False
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .AdjustColumnWidth = True
    .RefreshPeriod = 0
    .WebSelectionType = xlEntirePage
    .WebFormatting = xlWebFormattingNone
    .WebPreFormattedTextToColumns = True
    .WebConsecutiveDelimitersAsOne = True
    .WebSingleBlockTextImport = False
    .WebDisableDateRecognition = False
    .WebDisableRedirections = False
    .Refresh BackgroundQuery:=False

  21. #21
    Registered User
    Join Date
    08-04-2010
    Location
    Portland, Oregon, USA
    MS-Off Ver
    Excel 2007
    Posts
    1

    Re: Getting Web page source (HTML or XML) in VBA?

    Brilliant code, SuitedAces!

    Working off your code, I was able to turn it into an automated share count extractor from Yahoo Finance (soon to port it to Google Finance as well). Sheet2 now has a list of ~500 stock symbols in column A, and column B is being filled in one by one (~4 seconds/stock symbol) with the respective share count. It's anything but robust at this point, but it's handling the job flawlessly.

    It uses two loops - one to check each of the cells of coded output for the correct piece of info, and one to run through the list of symbols, changing the URL each time.

    If anyone would like this code, it's attached...
    Attached Files Attached Files

  22. #22
    Registered User
    Join Date
    04-10-2012
    Location
    india
    MS-Off Ver
    Excel 2003
    Posts
    2

    Re: Getting Web page source (HTML or XML) in VBA?

    how to get the data from a particular tag in website?

  23. #23
    Forum Contributor
    Join Date
    04-21-2009
    Location
    England
    MS-Off Ver
    Excel 2016
    Posts
    710

    Re: Getting Web page source (HTML or XML) in VBA?

    Great coding and kind of along the lines of what i've been looking for - but is there a fix regarding the URL - ie - can this be called from cell A2 as originally designed?

  24. #24
    Forum Contributor arlu1201's Avatar
    Join Date
    09-09-2011
    Location
    Bangalore, India
    MS-Off Ver
    Excel 2003 & 2007
    Posts
    19,166

    Re: Getting Web page source (HTML or XML) in VBA?

    cmb80,

    This is an old thread. Its best if you discuss this in a new thread or in your own thread.
    If I have helped, Don't forget to add to my reputation (click on the star below the post)
    Don't forget to mark threads as "Solved" (Thread Tools->Mark thread as Solved)
    Use code tags when posting your VBA code: [code] Your code here [/code]

  25. #25
    Forum Contributor
    Join Date
    04-21-2009
    Location
    England
    MS-Off Ver
    Excel 2016
    Posts
    710

    Re: Getting Web page source (HTML or XML) in VBA?

    I have started a new thread but no responses

  26. #26
    Forum Contributor arlu1201's Avatar
    Join Date
    09-09-2011
    Location
    Bangalore, India
    MS-Off Ver
    Excel 2003 & 2007
    Posts
    19,166

    Re: Getting Web page source (HTML or XML) in VBA?

    You can bump it if you do not get a reply after 12 hrs of starting the thread.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1