scrape data (excel query / import data) from multiple webpages

muppet77

Board Regular
Joined
Jan 24, 2004
Messages
223
i am trying to import data from many websites into excel.
the problem involves football teams and how many corners they took in each match. (don't ask!)

each webage has a predictable pattern for the address, so i have worked out that only a number changes in the address by 1 to get the next page or match.

eg

http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168066&typeofpage=84140

is the first match.
i am interested in gathering the two team names
SC Heerenveen
Excelsior

and the number of corners each ("Hoekschoppen")
4
5

i am trying to get these 4 bits of data into a row in excel, then underneath data from the next match:

http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168065&typeofpage=84140

the number in the middle of the address decreases by 1 each time, so the first few would be

http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168066&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168065&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168064&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168063&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168062&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168061&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168060&typeofpage=84140
http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168059&typeofpage=84140

is there a way i can automate this process?
(i have a list of all 306 addresses / matches in excel)

any help / hints appreciated

thanks in advance.
 

Excel Facts

VLOOKUP to Left?
Use =VLOOKUP(A2,CHOOSE({1,2},$Z$1:$Z$99,$Y$1:$Y$99),2,False) to lookup Y values to left of Z values.
i have been searching high and low for help.

http://www.mrexcel.com/forum/showthread.php?t=291419&highlight=open+web+pages

this is code from Pluto (person not the planet!)

Sub WebDataImport()
'
' WebDataImport Macro
' Macro recorded 04/12/2007 by
'
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=183757&typeofpage=84140", Destination:=Range( _
"B3"))
.Name = "THE TABLE NAME GOES HERE"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub

which scrapes the entire website, but ho wdo i find the 'name' of the table, rather than the whole page?

also how do i loop through all the pages?
 
Upvote 0
Sub Macro2()
'
' Macro2 Macro
' Macro recorded 07/04/2009 by ms
'

'
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://www.vi.nl/Wedstrijden/Wedstrijd.htm?dbid=168066&typeofpage=84140", _
Destination:=Range("A1"))
.Name = "Wedstrijd.htm?dbid=168066&typeofpage=84140_3"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "1,""wedstrijdTable2"""
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub

ok - this is a macro that i recorded to see what happens.
the webpage is one of many - can anyone help me with the "loop" that gets the next page (from list in post number 1) and refires it?

thanks
 
Upvote 0
One of my friend given this macro you can put all those source link(URL) in sheet1 of ur workbook make sure sheet2 and sheet3 is empty then run this macro
it is working fine in MS 2003.. not all the data you will get.. but only 35 columns in the first column will be copied and pasted in the sheet2 do a trial on only one link first and change based on your requirement
========================================
Sub Macro2()
a = 1
Sheets(3).Select
While Sheets(1).Cells(a, 1) <> ""
urladdress = "URL;" & Sheets(1).Cells(a, 1).Text

With ActiveSheet.QueryTables.Add(Connection:= _
urladdress, Destination:=Range( _
"$A$1"))
.Name = Right(Sheets(1).Cells(a, 1).Value, 80)
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
'.WebTables = "1,""wedstrijdTable2"""
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With


Range("A1").Copy Sheets(2).Cells(a, 1)
Range("A2").Copy Sheets(2).Cells(a, 2)
Range("A3").Copy Sheets(2).Cells(a, 3)
Range("A4").Copy Sheets(2).Cells(a, 4)
Range("A5").Copy Sheets(2).Cells(a, 5)
Range("A6").Copy Sheets(2).Cells(a, 6)
Range("A7").Copy Sheets(2).Cells(a, 7)
Range("A8").Copy Sheets(2).Cells(a, 8)
Range("A9").Copy Sheets(2).Cells(a, 9)
Range("A10").Copy Sheets(2).Cells(a, 10)
Range("A11").Copy Sheets(2).Cells(a, 11)
Range("A12").Copy Sheets(2).Cells(a, 12)
Range("A13").Copy Sheets(2).Cells(a, 13)
Range("A14").Copy Sheets(2).Cells(a, 14)
Range("A15").Copy Sheets(2).Cells(a, 15)
Range("A16").Copy Sheets(2).Cells(a, 16)
Range("A17").Copy Sheets(2).Cells(a, 17)
Range("A18").Copy Sheets(2).Cells(a, 18)
Range("A19").Copy Sheets(2).Cells(a, 19)
Range("A20").Copy Sheets(2).Cells(a, 20)
Range("A21").Copy Sheets(2).Cells(a, 21)
Range("A22").Copy Sheets(2).Cells(a, 22)
Range("A23").Copy Sheets(2).Cells(a, 23)
Range("A24").Copy Sheets(2).Cells(a, 24)
Range("A25").Copy Sheets(2).Cells(a, 25)
Range("A26").Copy Sheets(2).Cells(a, 26)
Range("A27").Copy Sheets(2).Cells(a, 27)
Range("A28").Copy Sheets(2).Cells(a, 28)
Range("A29").Copy Sheets(2).Cells(a, 29)
Range("a30").Copy Sheets(2).Cells(a, 30)
Range("A31").Copy Sheets(2).Cells(a, 31)
Range("A32").Copy Sheets(2).Cells(a, 32)
Range("A33").Copy Sheets(2).Cells(a, 33)
Range("A34").Copy Sheets(2).Cells(a, 34)
Range("A35").Copy Sheets(2).Cells(a, 35)


a = a + 1
Sheets(3).Cells.ClearContents
Wend
End Sub
 
Upvote 0

Forum statistics

Threads
1,215,066
Messages
6,122,948
Members
449,095
Latest member
nmaske

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top