Dr. Dobb's Journal - January 2008 - (Page 23) d01jurg_p6db 11/8/07 11:18 AM Page 23 were normal distributions. Evidently, statistical knowledge itself is not distributed evenly among software developers. You will need several functions when linearizing multiple types of distributions. Each function only needs one collection of weights as input, and it returns a new (linearized) version of the collection. I suggest you work with generic interfaces for collections so that you can apply the same functions to different types of data sources. It is necessary to specify explicit upper and lower boundaries to the desired range of output values. It also seems proper to work with decimal or real numbers, not integers. Rounding the values to integers should be left to the UI code, in my opinion. Listing Two is my attempt at linearizing a normal distribution, which is partly based on some examples on the Internet. The function calculates the standard deviation (sd) and makes the statistically correct assumption that nearly all numbers will be in the range -2 * sd to + 2 * sd. For each number, a new weight is calculated on a straight line through that range. Listing Three presents an algorithm that linearizes a Pareto distribution. This function calculates a new weight for each number using a logarithm, with e as the base number. (Diehards among us will not be satisfied with this and can determine from their own source data which base number would render the best approximation.) The remainder of the function in this case also plots the new values on a fictitious linear line between the minimum and maximum values. Presentation For determining font sizes, I encountered four techniques (Listing Four). You can see that both absolute (url1) and relative (url2) font sizes are used. You can also translate the weights to CSS class names (url3) or to multiple instances of sizing tags (url4). Undoubtedly, you will think of more alternatives that you would want to support with your code. I don’t address here how you create UI controls or bind HTML tags to data. I assume you can take the ideas from my examples and turn them into working code. Just remember that front-end developers are at least as stubborn as other software engineers. If you decide to create your own UI control, try to facilitate your front-end Listing Three Public Shared Function FromParetoCurve( _ ByVal weights As ICollection(Of Decimal), _ ByVal minSize As Decimal, ByVal maxSize As Decimal) _ As ICollection(Of Decimal) ‘Convert each weight to its log value. Const BASE As Double = Math.E Dim logweights As New List(Of Decimal) For Each w As Decimal In weights logweights.Add(CDec(Math.Log(w, BASE))) Next ‘First, find the min and max weight. Dim min As Decimal = Decimal.MaxValue Dim max As Decimal = Decimal.MinValue For Each w As Decimal In logweights If w max Then max = w Next ‘Now calculate the slope of a straight line, from min to max. Dim slope As Double If max > min Then slope = (maxSize - minSize) / (max - min) End If ‘Get the value in the middle between minSize and maxSize. Dim middle As Double = (minSize + maxSize) / 2 ‘Calculate the result for each of the weights. Dim output As New List(Of Decimal) For Each w As Decimal In logweights If (max <= min) Then ‘With max=min all tags have the same weight. output.Add(CDec(middle)) Else ‘Calculate the distance from the minimum for this weight. Dim distance As Double = w - min ‘Calculate the position on the slope for this distance. Dim result As Double = CDec(slope * distance + minSize) ‘If the tag turned out too small, set minSize. If result maxSize Then result = maxSize output.Add(CDec(result)) End If Next Return output End Function Figure 1: A tag cloud. Listing Four Dim url1 As String = “ {2} ” Dim url2 As String = “ {2} ” Dim url3 As String = “ {2} ” Dim url4 As String = “ {2} ” Figure 2: Linearization of source data. January 2008 l www.ddj.com l Dr. Dobb’s Journal 23 http://www.ddj.com
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.