I'm trying to extract text content from Excel using a macro. This is my code:
Dim i As Integer, j As Integer
Dim v1 As Variant
Dim Txt As String
v1 = Range("A2:C15")
For i = 1 To UBound(v1)
For j = 1 To UBound(v1, 2)
Txt = Txt & v1(i, j)
Txt = Txt & vbCrLf
But it is showing the raw characters only meaning that it doesn't show any formatting information like bold, italic, underline, etc..
I want to extract the text along with the formatting information.
Example: This is sample text
Expected output: This is sample text
Actual output: This is sample text
Can someone explain what's wrong with the code and tell if anything is wrong?
OK, let's have the algorithm from @stucharo a little bit simpler to extend.
Public Function getHTMLFormattedString(r As Range) As String isBold = False isItalic = False isUnderlined = False s = "" cCount = 0 On Error Resume Next cCount = r.Characters.Count On Error GoTo 0 If cCount > 0 Then For i = 1 To cCount Set c = r.Characters(i, 1) If isUnderlined And c.Font.Underline = xlUnderlineStyleNone Then isUnderlined = False s = s & "</u>" End If If isItalic And Not c.Font.Italic Then isItalic = False s = s & "</i>" End If If isBold And Not c.Font.Bold Then isBold = False s = s & "</b>" End If If c.Font.Bold And Not isBold Then isBold = True s = s + "<b>" End If If c.Font.Italic And Not isItalic Then isItalic = True s = s + "<i>" End If If Not (c.Font.Underline = xlUnderlineStyleNone) And Not isUnderlined Then isUnderlined = True s = s + "<u>" End If s = s & c.Text If i = cCount Then If isUnderlined Then s = s & "</u>" If isItalic Then s = s & "</i>" If isBold Then s = s & "</b>" End If Next i Else s = r.Text If r.Font.Bold Then s = "<b>" & s & "</b>" If r.Font.Italic Then s = "<i>" & s & "</i>" If Not (r.Font.Underline = xlUnderlineStyleNone) Then s = "<u>" & s & "</u>" End If getHTMLFormattedString = s End Function
To be clear, this function works only with a range containing a single cell. But it should be easy calling this function for each cell in a bigger range and concatenating the returned strings into one.
Edit by the OP:
I called the function by the below code:
Sub ReplaceFormattingTags() Dim i As Integer, j As Integer Dim rng As Range Dim Txt As String Set rng = Range("A2:C15") For i = 1 To rng.Rows.Count For j = 1 To rng.Columns.Count Txt = Txt & getHTMLFormattedString(rng(i, j)) & " " Next j Txt = Txt & vbCrLf Next i Debug.Print Txt End Sub
A messagebox does not permit formatiing without changing system defaults, which is not a starightforward approach. If you want to display formatted text in a prompt then you are probably easiest to create a userform and format the label appropriately.
For example, you can determine if a cell has bold fomatting using:
Dim isBold As Boolean isBold = v1(i, j).Font.Bold
And apply this to a userform label font using:
label.Font.Bold = isBold
If you want to output to a text (ie .txt) file then this cannot store any formatting information. The best you could hope to achieve is to create a markup style output where:
If isBold Then txt = "<b >mytext< /b>" 'Ignore the spaces Else txt = "mytext" End If
range.Font.Bold property has three return options:
v1(i, j).Font.Bold = True 'if the entire cell IS bold v1(i, j).Font.Bold = False 'if the entire cell IS NOT bold v1(i, j).Font.Bold = Null 'if the cell is PARTIALLY bold
IsNull(v1(i, j).Font.Bold) will tell you whether you have partial fomatting in a cell. Unfortunately you must then assess each character in the string individually to determine the bold characters. This function should determine where the bold formatting is switched on or off in a string contained in the
Range object passed and add the appropriate markup tag:
Function markup(rng As Range) As String Dim chr As Integer Dim isCharBold As Boolean Dim str As String Dim tempChar As Characters isCharBold = False str = "" If IsNull(rng.Font.Bold) Then For chr = 1 To rng.Characters.Count Set tempChar = rng.Characters(chr, 1) If isCharBold Then If tempChar.Font.Bold Then str = str + tempChar.Text Else isCharBold = False str = str & "</b>" & tempChar.Text End If Else If tempChar.Font.Bold Then isCharBold = True str = str + "<b>" & tempChar.Text Else str = str & tempChar.Text End If End If Next chr Else str = rng.Value End If markup = str End Function
Notice that the
Else case just returns the default string values. You can modify this approach to work for any of the
.Font properties e.g. strikethrough, underline, italic....
The framework in the OP suggests that you are assigning the contents of a range of cells into an array of type
Variant. This essentially leaves you with an unformatted string of characters in each array index. In this case you won't be able to extract any formatting from the array strings. To access the
Characters().Font.Bold property you must be operating on a
Range object so it might be best to iterate through each cell in
Range("A2:C15") directly. This could be achieved by modifying your initial code as such, so it now calls the markup function:
Sub OutputText() Dim i As Integer, j As Integer Dim rng As Range Dim Txt As String Set rng = Range("A2:C15") For i = 1 To rng.Rows.Count For j = 1 To rng.Columns.Count Txt = Txt & markup(rng(i, j)) & " " Next j Txt = Txt & vbCrLf Next i Debug.Print Txt End Sub
VBA string does not support formatting like that. It will purely take the string from the range. No formatting at all. If you want to format the string, you can not see this through msgbox.
Only way to do it would be to store it in a cell then format the cell. But then that does not give you the output in a messagebox as a formatted string.
If you are planning to then put the string in a cell with formatting, you will need to save the formatting somewhere, or copy it from the cell you got the text from. And then apply the formatting to the cell