Tech Support Forum banner
Status
Not open for further replies.

[SOLVED] Find repeated words and highlight them

24K views 21 replies 2 participants last post by  ShaileshShinde 
#1 ·
Hello Team,
I would like to create an macro which will highlight all repeated words [word length is what user needs to enter before running the macro] in full document and highlight all repeated word leaving the first one un-highlighted.

Can you please provide any references or sample for this?

Thanks,
Shailesh
 
#2 ·
Re: Find repeated words and highlight them

Hi Shailesh,

If you want to find repeated words, I can't see how inputting the word length is especially helpful. A Find/Replace can do as you ask without the need to specify the word length. A spell-check will also find repeated words.
 
#4 ·
Re: Find repeated words and highlight them

Hi Shail,

It would be better to have the exlusion words in a list, so that you don't miss other words that should be highlighted. For example, if you set the minimum length to 3, you'd not find & highlight Mr Mr.
 
#6 ·
Re: Find repeated words and highlight them

Hi Shail,

Try the following 'HilightDocumentDuplicates' macro. All you should have to do is to point the macro to a folder, then let it run. All documents in the folder will be processed. The exlusions list is defined by the 'StrExcl' variable in the 'ConcordanceBuilder' sub.
Code:
Option Explicit
Dim SBar As Boolean
Dim TrkStatus As Boolean
Dim StrFnd As String

Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Store current Track Changes status, then switch off
With ActiveDocument
  TrkStatus = .TrackRevisions
  .TrackRevisions = False
End With
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & wdDoc.Name
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp & " " & StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        'Highlight the 2nd word
        .Duplicate.Words.Last.HighlightColorIndex = wdBrightGreen
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore original Track Changes status
ActiveDocument.TrackRevisions = TrkStatus
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub

Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function

Sub ConcordanceBuilder(wdDoc As Document)
Dim StrIn As String, StrTmp As String, StrExcl As String
Dim i As Long, j As Long, k As Long
'Define the exlusions list
StrExcl = "a,am,and,are,as,at,be,but,by,can,cm,did,do,does,eg,en,eq,etc," & _
          "for,get,go,got,has,have,he,her,him,how,i,ie,if,in,into,is,it,its," & _
          "me,mi,mm,my,na,nb,no,not,of,off,ok,on,one,or,our,out,re,she,so," & _
          "the,their,them,they,t,to,was,we,were,who,will,would,yd,you,your"
With wdDoc
  'Get the document's text
  StrIn = .Content.Text
  'Strip out unwanted characters
  For i = 1 To 255
    Select Case i
      Case 1 To 38, 40 To 64, 91 To 96, 123 To 144, 147 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Convert to lowercase
  StrIn = " " & LCase(Trim(StrIn)) & " "
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
  Next
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    'Find how many occurences of each word there are in the document
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    k = j - UBound(Split(StrIn, " "))
    'If there's more than one occurence, add the word to our Find list
    If k > 1 Then
      StrFnd = StrFnd & " " & StrTmp
    End If
    j = UBound(Split(StrIn, " "))
  Next
End With
End Sub
 
#8 ·
#10 ·
Re: Find repeated words and highlight them

Hi Shail,

The code works for me. Did you select a folder to process? Even though the code as posted works fine, here's a slightly improved version of the 'HilightDocumentDuplicates' sub:
Code:
Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & strFile
  ' Store current Track Changes status, then switch off
  TrkStatus = wdDoc.TrackRevisions
  wdDoc.TrackRevisions = False
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp & " " & StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        'Highlight the 2nd word
        .Duplicate.Words.Last.HighlightColorIndex = wdBrightGreen
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  ' Restore original Track Changes status
  wdDoc.TrackRevisions = TrkStatus
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub
 
#11 ·
Re: Find repeated words and highlight them

Yes, I have selected the folder. This does not highlight the repeated word and also no error messages.
I open the word app with default blank document and press ALT+F8 run the macro. it ask for folder path selected the folder and the script runs, but when i open the document from the folder. no repeated words gets highlighted.

Thanks,
Shail
 
#15 ·
Re: Find repeated words and highlight them

Hi,

The repeated words here means it can appear anywhere in the document. for example...[Document template offers] these three words has appears twice in the document. so, these needs to be highlighted.

Sorry, if my question is not clear to you.

Thanks,
Shail
 
#16 ·
Re: Find repeated words and highlight them

Try this version of the 'HilightDocumentDuplicates' sub:
Code:
Sub HilightDocumentDuplicates()
' Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
' Turn Off Screen Updating
Application.ScreenUpdating = False
Dim strFolder As String, strFile As String, wdDoc As Document
Dim StrTmp As String, i As Long, bFnd As Boolean
'Prompt for the folder to process
strFolder = GetFolder
If strFolder = "" Then Exit Sub
strFile = Dir(strFolder & "\*.doc", vbNormal)
'Process each file in the folder
While strFile <> ""
  Set wdDoc = Documents.Open(FileName:=strFolder & "\" & strFile, _
              AddToRecentFiles:=False, Visible:=False)
  ' Report progress on Status Bar.
  Application.StatusBar = "Processing " & strFile
  ' Store current Track Changes status, then switch off
  TrkStatus = wdDoc.TrackRevisions
  wdDoc.TrackRevisions = False
  'Compile the Find concordance
  Call ConcordanceBuilder(wdDoc)
  'Process all words in the concordance
  For i = 1 To UBound(Split(StrFnd, " "))
    StrTmp = Split(StrFnd, " ")(i)
    bFnd = False
    With wdDoc.Range
      With .Find
        .ClearFormatting
        'Look for duplicated words only
        .Text = StrTmp
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .Format = False
        .MatchCase = False
        .MatchWholeWord = True
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
        .Execute
      End With
      Do While .Find.Found
        If bFnd = True Then
          .Duplicate.HighlightColorIndex = wdBrightGreen
        End If
        bFnd = True
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
  Next
  ' Restore original Track Changes status
  wdDoc.TrackRevisions = TrkStatus
  wdDoc.Close SaveChanges:=True
  strFile = Dir()
Wend
Set wdDoc = Nothing
' Clear the Status Bar
Application.StatusBar = False
' Restore original Status Bar status
Application.DisplayStatusBar = SBar
' Restore Screen Updating
Application.ScreenUpdating = True
End Sub
 
#18 ·
Re: Find repeated words and highlight them

Hi Shail,

Are any words getting highlighted? If some are and others aren't have you checked whether the missed words are in the exclusions list? In my testing, the code works for any 'normal' word that's not in the exclusions list.

You've got some terms in your document that don't count as 'normal' words, though, such as C/C++. Words like that won't be processed correctly by my code because (a) the pre-processing strips out the / and + characters) and treats the two Cs as separate words. I can code around that, but it would be useful if you could tell me what particular words are causing problems.
 
#19 ·
Re: Find repeated words and highlight them

Hi,
None of the repeated words are getting highlighted. I have cross checked with the exclusion list. For example...the word "document" is appearing many times. However, none of these words get highlighted. Also, no issue if "C/C++" does not get highlighted.

Thanks,
Shail
 
#20 ·
Re: Find repeated words and highlight them

It does when I run the code. See attached. Below is a revised version of the 'ConcordanceBuilder' sub. I've added an inclusions list for expressions that get munged by the pre-processing.
Code:
Sub ConcordanceBuilder(wdDoc As Document)
Dim StrIn As String, StrTmp As String, StrIncl As String, StrExcl As String
Dim i As Long, j As Long, k As Long
'Define the exlusions list
StrExcl = "a,am,and,are,as,at,b,be,but,by,c,can,cm,d,did,do,does,e,eg,en,eq,etc,f," & _
          "for,g,get,go,got,h,has,have,he,her,him,how,i,ie,if,in,into,is,it,its,j,k,l," & _
          "m,me,mi,mm,my,n,na,nb,no,not,o,of,off,ok,on,one,or,our,out,p,q,r,re,s,she,so," & _
          "t,the,their,them,they,t,to,u,v,w,was,we,were,who,will,would,x,y,yd,you,your,z"
'Define an inclusions list for terms that otherwise don't survive the initial cleanup
StrIncl = "c/c++,c#"
With wdDoc
  'Get the document's text
  StrIn = .Content.Text
  'Strip out unwanted characters
  For i = 1 To 255
    Select Case i
      Case 1 To 38, 40 To 44, 46 To 64, 91 To 96, 123 To 144, 147 To 149, 152 To 171, 174 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Convert to lowercase
  StrIn = " " & LCase(Trim(StrIn)) & " "
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    While InStr(StrIn, " " & Split(StrExcl, ",")(i) & " ") > 0
      StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
    Wend
  Next
  StrIn = Replace(StrIncl, ",", " ") & StrIn
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    'Find how many occurences of each word there are in the document
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    k = j - UBound(Split(StrIn, " "))
    'If there's more than one occurence, add the word to our Find list
    If k > 1 Then
      strFnd = strFnd & " " & StrTmp
    End If
    j = UBound(Split(StrIn, " "))
  Next
End With
End Sub
 

Attachments

#21 ·
Re: Find repeated words and highlight them

Attached is a Word document with the working macro code. I've simplified the code a bit, by removing the progress report - it doesn't work when the visibility of the documents being processed is set to False.
 

Attachments

Status
Not open for further replies.
You have insufficient privileges to reply here.
Top