Need Help to Analyze Keyword Density

gimmeexcel

Board Regular
Joined
May 8, 2009
Messages
95
Hello everyone,

I need some assistance please. I need a formula for calculating the density of words in a column. I'm thinking this would involve some sort of VLOOKUP.

It will need to count every instance of each keyword and every phrase variation. Then it will determine the density of each one.

For example: "Totally Free DVD Players" This is the reference, it would be column A

Word Counts would be in Column B

1 Word Counts
Totally
Free
DVD
Players

2 Word Counts
Totally Free
Free DVD
DVD Players

3 Word Counts
Totally Free DVD
Free DVD Players

4 Word Counts
Totally Free DVD Players

Let's assume that there are 100 cells from A 2 - A 101. The formula would need to capture somehow every word in the spreadsheet, reference against column A, and determine the density for every phrase starting with 1 word.

Example: The title of a Web page is 'Get Best XYZ Services'. Keyword Density for 'XYZ services' is 2*1/4*100%=50%. If you reduce the number of words in the title by removing the word 'get', so the title becomes 'Best XYZ Services', than the keyword weight will be larger: 2*1/3*100%=67%

Finally, I need to get an average for each density in the spreadsheet. I'd like this summary report to be created in a new worksheet.

I have link to a webpage that helps explain it better: http://www.web1marketing.com/blog/index.php/archives/calculating-keyword-density-in-excel/

Well, I guess that's it. Thanks in advance for any suggestions
 

Excel Facts

How to calculate loan payments in Excel?
Use the PMT function: =PMT(5%/12,60,-25000) is for a $25,000 loan, 5% annual interest, 60 month loan.
Hello,
This is really a deceptively difficult problem. The issue, as I see it, is that you have have non-normal non-structured data and you want to analyze multiple levels of its normalized data model.
Let's call the raw data "Sentences". So "Totally Free DVD Players" is a sentence. But then for each sentence there can be many phrases. The phrases of "Totally Free DVD Players" would be:

  1. Totally
  2. Totally Free
  3. Totally Free DVD
  4. Totally Free DVD Players
  5. Free
  6. Free DVD
  7. Free DVD Players
  8. DVD
  9. DVD Players
  10. Players

This is a triangular result and can be precalculated by N(N+1)/2 where N=The number of words in the phrase (4*5=20/2=10).

To make matters more fun, once the sentences are split out into phrases there can be overlap. So the sentence "Totally Free DVD Players" and the sentence "Totally Free Checking" can have overlap. Is the phrase "Totally Free" from sentence 1 counted the same as the phrase from sentence 2? If so then you now need to reduce make the dataset to unique value (remove duplicates).

Once you have a unique dataset of phrases you still need to be able to count the number of occurrences of each unique value in the non-unique dataset. So you now are up to three datasets (the original, the phrases, and the unique phrases).

Finally you need to do the name thing with words. Which must be handled separately from phrases.
<o:p> </o:p>
At any rate, as you can imagine, untangling all of this will take a lot of code. The problem interested me so I coded up a solution, but I don’t think it’s very self explanatory without seeing an example. Thusly, I have included a sample file in this post:)


Code:
Option Explicit
 
'Module's code By Aaron Bush 06/21/2009
'Free for reuse. Please retain credit for my work.
'Free to alter, add your own credit for what you change:)

Public Function GetWord(ByRef target As Excel.Range, ByVal wordNumber As Long) As String
    On Error Resume Next
    GetWord = GetWords(target)(wordNumber - 1&)
End Function

Public Function GetWords(ByRef target As Excel.Range) As String()
    GetWords = Split(ScrubEntry(target.Cells(1, 1).value))
End Function
Public Function GetPhrase(ByVal target As Excel.Range, ByVal phraseNumber As Long) As String
    On Error Resume Next
    GetPhrase = GetPhrases(target)(phraseNumber - 1&)
End Function

Public Function GetPhrases(ByVal target As Excel.Range) As String()
    Dim strWords() As String
    Dim strPhrases() As String
    Dim lngIndx As Long
    Dim lngSlide As Long
    Dim lngSrtPt As Long
    Dim lngEndPt As Long
    Dim lngUprBnd As Long
    Dim strPhrase As String
    strWords = GetWords(target)
    lngUprBnd = UBound(strWords)
    ReDim strPhrases((((lngUprBnd + 1&) * (lngUprBnd + 2&)) \ 2&) - 1&)
    For lngSrtPt = 0& To lngUprBnd
        For lngEndPt = lngSrtPt To lngUprBnd
            strPhrase = vbNullString
            For lngSlide = lngSrtPt To lngEndPt
                strPhrase = strPhrase & (strWords(lngSlide) & " ")
            Next
            strPhrases(lngIndx) = RTrim$(strPhrase)
            lngIndx = lngIndx + 1&
        Next
    Next
    GetPhrases = strPhrases
End Function

Public Function GetWordCount(ByVal target As Excel.Range) As Long
    Dim strVal As String
    strVal = ScrubEntry(target.Cells(1&, 1&).value)
    GetWordCount = Len(strVal) - Len(Replace(strVal, " ", vbNullString, compare:=vbBinaryCompare)) + 1&
End Function

Public Function GetPhraseCount(ByVal target As Excel.Range) As Long
    Dim lngWrdCnt As Long
    lngWrdCnt = GetWordCount(target)
    GetPhraseCount = (lngWrdCnt * (lngWrdCnt + 1&)) \ 2&
End Function

Public Function ScrubEntry(ByVal value As String) As String
    Dim strRtnVal As String
    strRtnVal = Trim$(value)
    Do While InStrB(strRtnVal, "  ")
        strRtnVal = Replace(strRtnVal, "  ", " ", compare:=vbBinaryCompare)
    Loop
    ScrubEntry = strRtnVal
End Function

Private Sub AppendArray(ByRef target() As String, ByRef source() As String)
    Dim lngUprBndTrgt As Long
    Dim lngUprBndSrc As Long
    Dim lngUprBndNew As Long
    Dim lngIndxTrgt As Long
    Dim lngIndxSrc As Long
    lngUprBndTrgt = UBound(target)
    lngUprBndSrc = UBound(source)
    lngUprBndNew = lngUprBndTrgt + lngUprBndSrc + 1&
    ReDim Preserve target(lngUprBndNew) As String
    For lngIndxTrgt = lngUprBndTrgt + 1& To lngUprBndNew
        target(lngIndxTrgt) = source(lngIndxSrc)
        lngIndxSrc = lngIndxSrc + 1&
    Next
End Sub

Public Sub RefreshUniqueWords()
    Const lngSrcSheet_c As String = "Sentences"
    Dim rngData As Excel.Range
    Dim strWords() As String
    Dim lngRow As Long
    Dim lngBtmRow As Long
    Dim lngTopRow As Long
    Set rngData = ThisWorkbook.Worksheets(lngSrcSheet_c).UsedRange.Columns(1&)
    Select Case MsgBox("Does the selection include headers?", vbQuestion + vbYesNoCancel + vbDefaultButton1)
        Case vbCancel: Exit Sub
        Case vbYes: lngTopRow = 2&
        Case vbNo: lngTopRow = 1&
        Case Else: Err.Raise vbObjectError, , "Unexpected response"
    End Select
    lngBtmRow = rngData.Cells.Count
    For lngRow = lngTopRow To lngBtmRow
        If LenB(rngData.Cells(lngRow, 1&).value) Then
            strWords = GetWords(rngData.Cells(lngRow, 1&))
            Exit For
        End If
    Next
    For lngRow = lngRow + 1& To lngBtmRow
        AppendArray strWords, GetWords(rngData.Cells(lngRow, 1&))
    Next
    Output SelectUnique(strWords), "Unique Words", "Words"
End Sub

Public Sub RefreshUniquePhrases()
    Const lngSrcSheet_c As String = "Sentences"
    Dim rngData As Excel.Range
    Dim strPhrases() As String
    Dim lngRow As Long
    Dim lngBtmRow As Long
    Dim lngTopRow As Long
    Set rngData = ThisWorkbook.Worksheets(lngSrcSheet_c).UsedRange.Columns(1&)
    Select Case MsgBox("Does the selection include headers?", vbQuestion + vbYesNoCancel + vbDefaultButton1)
        Case vbCancel: Exit Sub
        Case vbYes: lngTopRow = 2&
        Case vbNo: lngTopRow = 1&
        Case Else: Err.Raise vbObjectError, , "Unexpected response"
    End Select
    lngBtmRow = rngData.Cells.Count
    For lngRow = lngTopRow To lngBtmRow
        If LenB(rngData.Cells(lngRow, 1&).value) Then
            strPhrases = GetPhrases(rngData.Cells(lngRow, 1&))
            Exit For
        End If
    Next
    For lngRow = lngRow + 1& To lngBtmRow
        AppendArray strPhrases, GetPhrases(rngData.Cells(lngRow, 1&))
    Next
    Output SelectUnique(strPhrases), "Unique Phrases", "Phrases"
End Sub

Private Function SelectUnique(ByRef value() As String, Optional ByVal compare As VbCompareMethod = VbCompareMethod.vbTextCompare) As String()
    Const lngMatch_c As Long = 0&
    Dim strRtnVal() As String
    Dim lngIndx As Long
    Dim lngIndx2 As Long
    Dim lngUprBnd As Long
    Dim lngUprBnd2 As Long
    lngUprBnd2 = -1&
    lngUprBnd = UBound(value)
    ReDim strRtnVal(lngUprBnd) As String
    For lngIndx = 0& To lngUprBnd
        For lngIndx2 = 0& To lngUprBnd2
            If StrComp(value(lngIndx), strRtnVal(lngIndx2), compare) = lngMatch_c Then Exit For
        Next
        If lngIndx2 > lngUprBnd2 Then
            lngUprBnd2 = lngUprBnd2 + 1&
            strRtnVal(lngUprBnd2) = value(lngIndx)
        End If
    Next
    ReDim Preserve strRtnVal(lngUprBnd2) As String
    SelectUnique = strRtnVal
End Function

Private Sub Output(ByRef value() As String, ByVal sheetName As String, ByVal sourceSheet As String)
    Const lngMatch_c As Long = 0&
    Dim ws As Excel.Worksheet
    Dim lngIndx As Long
    For Each ws In ThisWorkbook.Worksheets
        If StrComp(ws.Name, sheetName, vbTextCompare) = lngMatch_c Then Exit For
    Next
    If ws Is Nothing Then
        Set ws = ActiveWorkbook.Worksheets.Add
        ws.Name = sheetName
    Else
        ws.UsedRange.Delete
    End If
    ws.Range("A1:C1").value = Array("Value", "Count", "Percent")
    ws.Columns(3).NumberFormat = "0.0%"
    For lngIndx = 0& To UBound(value)
        ws.Cells(lngIndx + 2, 1&).value = value(lngIndx)
    Next
    ws.Range("B2:B" & lngIndx + 1&).value = "=COUNTIF(" & ThisWorkbook.Worksheets(sourceSheet).UsedRange.Address(True, True, xlR1C1, True) & ",RC[-1])"
    ws.Range("C2:C" & lngIndx + 1&).value = "=RC[-1]/" & lngIndx
    
    ws.UsedRange.Sort ws.Cells(1&, 2&), xlDescending, HEADER:=xlYes
End Sub

 
Upvote 0
Hello,
This is really a deceptively difficult problem. The issue, as I see it, is that you have have non-normal non-structured data and you want to analyze multiple levels of its normalized data model.
Let's call the raw data "Sentences". So "Totally Free DVD Players" is a sentence. But then for each sentence there can be many phrases. The phrases of "Totally Free DVD Players" would be:

  1. Totally
  2. Totally Free
  3. Totally Free DVD
  4. Totally Free DVD Players
  5. Free
  6. Free DVD
  7. Free DVD Players
  8. DVD
  9. DVD Players
  10. Players

This is a triangular result and can be precalculated by N(N+1)/2 where N=The number of words in the phrase (4*5=20/2=10).

To make matters more fun, once the sentences are split out into phrases there can be overlap. So the sentence "Totally Free DVD Players" and the sentence "Totally Free Checking" can have overlap. Is the phrase "Totally Free" from sentence 1 counted the same as the phrase from sentence 2? If so then you now need to reduce make the dataset to unique value (remove duplicates).

Once you have a unique dataset of phrases you still need to be able to count the number of occurrences of each unique value in the non-unique dataset. So you now are up to three datasets (the original, the phrases, and the unique phrases).

Finally you need to do the name thing with words. Which must be handled separately from phrases.
<o:p> </o:p>
At any rate, as you can imagine, untangling all of this will take a lot of code. The problem interested me so I coded up a solution, but I don’t think it’s very self explanatory without seeing an example. Thusly, I have included a sample file in this post:)


Code:
Option Explicit
 
'Module's code By Aaron Bush 06/21/2009
'Free for reuse. Please retain credit for my work.
'Free to alter, add your own credit for what you change:)

Public Function GetWord(ByRef target As Excel.Range, ByVal wordNumber As Long) As String
    On Error Resume Next
    GetWord = GetWords(target)(wordNumber - 1&)
End Function

Public Function GetWords(ByRef target As Excel.Range) As String()
    GetWords = Split(ScrubEntry(target.Cells(1, 1).value))
End Function
Public Function GetPhrase(ByVal target As Excel.Range, ByVal phraseNumber As Long) As String
    On Error Resume Next
    GetPhrase = GetPhrases(target)(phraseNumber - 1&)
End Function

Public Function GetPhrases(ByVal target As Excel.Range) As String()
    Dim strWords() As String
    Dim strPhrases() As String
    Dim lngIndx As Long
    Dim lngSlide As Long
    Dim lngSrtPt As Long
    Dim lngEndPt As Long
    Dim lngUprBnd As Long
    Dim strPhrase As String
    strWords = GetWords(target)
    lngUprBnd = UBound(strWords)
    ReDim strPhrases((((lngUprBnd + 1&) * (lngUprBnd + 2&)) \ 2&) - 1&)
    For lngSrtPt = 0& To lngUprBnd
        For lngEndPt = lngSrtPt To lngUprBnd
            strPhrase = vbNullString
            For lngSlide = lngSrtPt To lngEndPt
                strPhrase = strPhrase & (strWords(lngSlide) & " ")
            Next
            strPhrases(lngIndx) = RTrim$(strPhrase)
            lngIndx = lngIndx + 1&
        Next
    Next
    GetPhrases = strPhrases
End Function

Public Function GetWordCount(ByVal target As Excel.Range) As Long
    Dim strVal As String
    strVal = ScrubEntry(target.Cells(1&, 1&).value)
    GetWordCount = Len(strVal) - Len(Replace(strVal, " ", vbNullString, compare:=vbBinaryCompare)) + 1&
End Function

Public Function GetPhraseCount(ByVal target As Excel.Range) As Long
    Dim lngWrdCnt As Long
    lngWrdCnt = GetWordCount(target)
    GetPhraseCount = (lngWrdCnt * (lngWrdCnt + 1&)) \ 2&
End Function

Public Function ScrubEntry(ByVal value As String) As String
    Dim strRtnVal As String
    strRtnVal = Trim$(value)
    Do While InStrB(strRtnVal, "  ")
        strRtnVal = Replace(strRtnVal, "  ", " ", compare:=vbBinaryCompare)
    Loop
    ScrubEntry = strRtnVal
End Function

Private Sub AppendArray(ByRef target() As String, ByRef source() As String)
    Dim lngUprBndTrgt As Long
    Dim lngUprBndSrc As Long
    Dim lngUprBndNew As Long
    Dim lngIndxTrgt As Long
    Dim lngIndxSrc As Long
    lngUprBndTrgt = UBound(target)
    lngUprBndSrc = UBound(source)
    lngUprBndNew = lngUprBndTrgt + lngUprBndSrc + 1&
    ReDim Preserve target(lngUprBndNew) As String
    For lngIndxTrgt = lngUprBndTrgt + 1& To lngUprBndNew
        target(lngIndxTrgt) = source(lngIndxSrc)
        lngIndxSrc = lngIndxSrc + 1&
    Next
End Sub

Public Sub RefreshUniqueWords()
    Const lngSrcSheet_c As String = "Sentences"
    Dim rngData As Excel.Range
    Dim strWords() As String
    Dim lngRow As Long
    Dim lngBtmRow As Long
    Dim lngTopRow As Long
    Set rngData = ThisWorkbook.Worksheets(lngSrcSheet_c).UsedRange.Columns(1&)
    Select Case MsgBox("Does the selection include headers?", vbQuestion + vbYesNoCancel + vbDefaultButton1)
        Case vbCancel: Exit Sub
        Case vbYes: lngTopRow = 2&
        Case vbNo: lngTopRow = 1&
        Case Else: Err.Raise vbObjectError, , "Unexpected response"
    End Select
    lngBtmRow = rngData.Cells.Count
    For lngRow = lngTopRow To lngBtmRow
        If LenB(rngData.Cells(lngRow, 1&).value) Then
            strWords = GetWords(rngData.Cells(lngRow, 1&))
            Exit For
        End If
    Next
    For lngRow = lngRow + 1& To lngBtmRow
        AppendArray strWords, GetWords(rngData.Cells(lngRow, 1&))
    Next
    Output SelectUnique(strWords), "Unique Words", "Words"
End Sub

Public Sub RefreshUniquePhrases()
    Const lngSrcSheet_c As String = "Sentences"
    Dim rngData As Excel.Range
    Dim strPhrases() As String
    Dim lngRow As Long
    Dim lngBtmRow As Long
    Dim lngTopRow As Long
    Set rngData = ThisWorkbook.Worksheets(lngSrcSheet_c).UsedRange.Columns(1&)
    Select Case MsgBox("Does the selection include headers?", vbQuestion + vbYesNoCancel + vbDefaultButton1)
        Case vbCancel: Exit Sub
        Case vbYes: lngTopRow = 2&
        Case vbNo: lngTopRow = 1&
        Case Else: Err.Raise vbObjectError, , "Unexpected response"
    End Select
    lngBtmRow = rngData.Cells.Count
    For lngRow = lngTopRow To lngBtmRow
        If LenB(rngData.Cells(lngRow, 1&).value) Then
            strPhrases = GetPhrases(rngData.Cells(lngRow, 1&))
            Exit For
        End If
    Next
    For lngRow = lngRow + 1& To lngBtmRow
        AppendArray strPhrases, GetPhrases(rngData.Cells(lngRow, 1&))
    Next
    Output SelectUnique(strPhrases), "Unique Phrases", "Phrases"
End Sub

Private Function SelectUnique(ByRef value() As String, Optional ByVal compare As VbCompareMethod = VbCompareMethod.vbTextCompare) As String()
    Const lngMatch_c As Long = 0&
    Dim strRtnVal() As String
    Dim lngIndx As Long
    Dim lngIndx2 As Long
    Dim lngUprBnd As Long
    Dim lngUprBnd2 As Long
    lngUprBnd2 = -1&
    lngUprBnd = UBound(value)
    ReDim strRtnVal(lngUprBnd) As String
    For lngIndx = 0& To lngUprBnd
        For lngIndx2 = 0& To lngUprBnd2
            If StrComp(value(lngIndx), strRtnVal(lngIndx2), compare) = lngMatch_c Then Exit For
        Next
        If lngIndx2 > lngUprBnd2 Then
            lngUprBnd2 = lngUprBnd2 + 1&
            strRtnVal(lngUprBnd2) = value(lngIndx)
        End If
    Next
    ReDim Preserve strRtnVal(lngUprBnd2) As String
    SelectUnique = strRtnVal
End Function

Private Sub Output(ByRef value() As String, ByVal sheetName As String, ByVal sourceSheet As String)
    Const lngMatch_c As Long = 0&
    Dim ws As Excel.Worksheet
    Dim lngIndx As Long
    For Each ws In ThisWorkbook.Worksheets
        If StrComp(ws.Name, sheetName, vbTextCompare) = lngMatch_c Then Exit For
    Next
    If ws Is Nothing Then
        Set ws = ActiveWorkbook.Worksheets.Add
        ws.Name = sheetName
    Else
        ws.UsedRange.Delete
    End If
    ws.Range("A1:C1").value = Array("Value", "Count", "Percent")
    ws.Columns(3).NumberFormat = "0.0%"
    For lngIndx = 0& To UBound(value)
        ws.Cells(lngIndx + 2, 1&).value = value(lngIndx)
    Next
    ws.Range("B2:B" & lngIndx + 1&).value = "=COUNTIF(" & ThisWorkbook.Worksheets(sourceSheet).UsedRange.Address(True, True, xlR1C1, True) & ",RC[-1])"
    ws.Range("C2:C" & lngIndx + 1&).value = "=RC[-1]/" & lngIndx
    
    ws.UsedRange.Sort ws.Cells(1&, 2&), xlDescending, HEADER:=xlYes
End Sub


Hi Oorang,

Sorry that it took so long to get back to you. However, I did not get the auto-email, so I did not know that you had replied.

This is very impressive! Are you a mathematician? I don't know where to start; I'm studying the example to try and figure this one out.

Thanks so much. I will be in touch soon.
 
Upvote 0
I got an error when I tried to run it: Run-time error '9':

Subscript out of range

Code:
Set rngData = ThisWorkbook.Worksheets(lngSrcSheet_c).UsedRange.Columns(1&)
 
Upvote 0
The example I gave assumes that the list of sentences resides in column A of a sheet named "Sentences". I'd guess your sheet is named something different:)
 
Upvote 0
Not sure what I'm doing wrong. Here's what I did:

1. Copied my keywords to Column A of the Sentences Sheet
2. ALT + F11
3. Tools >> Macros
4. Ran the 1st Macro

The Sentences, Phrases, and Words worksheets updated the info. However, the Unique Phrases and Unique Words did not.

Did I follow the correct steps
 
Upvote 0
Yup I am here:) Did you take a look at the sample file? If you take a look at the phrases/words sheets you'll see those were created using UDFs. They were simple enough though that I didn't make anything to auto generate them. You just need to look at the formula(s) and copy/paste it over wide enough area. I wasn't really going for a production solution, just something generic enough that it could be used as a "swiss army knife" for that sort of problem. I tried to functionalize most of the code so you could use it a few different ways to be able to pull what you wanted out in not only the ways discussed, but if you wanted other metrics later they would be flexible enough for you to get what you want without writing a whole other solution.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,591
Messages
6,120,426
Members
448,961
Latest member
nzskater

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top