To apply the proposed simulation, a simulated heterogeneous crowd of approximately 180,000 members is utilized (Wang & Zaniolo 2015). This database is a simulated temporal data set used to model employees within an organization, originally created to test database systems. Information about individuals includes their department, date of employment, age, salary and title. This information has been used in this work to provide additional attributes to the individuals of our generated crowd.
From the crowd, design teams are formed utilizing a subdivision of unique members who focus on specific design initiatives. The results are reviewed with distinct emphasis placed on network centrality, density and size. The following simulation study also includes a parametric analysis to understand the overall effects of the variation of team generation variables, such as team size and communication link threshold, utilized within the development. The inclusion of these variables allows for a more in-depth look at potential network structures as the network characteristics change. While the results cannot guarantee strict network development considerations, they highlight the usage of the simulation framework and its potential for team design.
4.1 Individual organization
The design capability of each group is evaluated based on the combination of skills in each developed team. While each individual may possess overlapping abilities across a range of disciplines, the decomposition of their abilities is represented to allow for a direct mapping between project goals and individual attributes.
To begin the analysis, we simulate 1,000 design teams, in which the organizational structure is an outcome of the random intersection model being applied. The first developed networks to be studied consist of 25 members, with a complete random intersection model and a communication link generation threshold of three. Application of a threshold of three indicates that two individuals must have a commonality of 25% within their pre-assigned traits, leading to networks of greater connectivity when compared to networks of higher commonality requirements. A level of three was also chosen to obtain a better understanding of how collaborations impact the design success. The following sections explore the resulting design scores and their distribution; overall network characteristics, including closeness, betweenness, eigenvector centrality, diameter, density and degree; and the top and worst performing design teams.
The potential number of combinations of individuals with a team size of 25 from a pool of 180,000 unique members is approximately
, which does not even account for the possible connections that can be formed. The authors recognize that the generation of 1,000 teams only captures a very small fraction of the possible number of teams and connections available. Such a large search space lends itself to a more directed search using heuristic algorithms to intelligently search the solution space. This concept is currently being explored in other work (Ball & Lewis 2017); however, the primary motivation of this work is the exploration of network analysis metrics and how they relate to a variety of team formations and information flow characteristics.
4.1.1 Design score
To demonstrate that the team generation captured a wide variety of team compositions, Figure 3 shows the distribution of the design scores.
Figure 3. Histogram of design scores.
The distribution of the simulated design scores follows a normal distribution with an average of 936.5 and a standard deviation of 159.6. The top score from this set of teams is 1415.5 and the worst performing team scores a 526.6. It is important to obtain a complete coverage of design team potentials as this allows for teams of widely varying abilities to be studied. While the sample of teams only represent a limited portion of the potential combinations, given the approximate normal distribution, the authors believe that the results found can be considered characteristic of the pool and model being utilized. In the following sections, overall trends in network composition are discussed.
4.1.2 Network characteristics
Closeness centrality, shown in Figure 4(a), represents the number of information channels necessary for one individual to reach another individual and not the physical closeness of each individual. High physical closeness in a distributed mass collaboration context is rarely possibly as members of the crowd are widely dispersed and they must have the ability to work with other individuals regardless of their location. Average closeness centrality follows a positive linear correlation with respect to design score.
Figure 4. Network graph centrality indicators.
This result yielded an
-squared value of 62.3%, indicating a reasonably strong correlation. Another thing to note is that this result excludes teams with a closeness value of zero, indicating the presence of isolates or incomplete network graphs. From this result, we conclude that increased levels of closeness centrality, indicating a shorter geodesic path between members, generate design teams with higher design potential. This idea points to the development of new information channels for members who currently experience individual levels of low closeness centrality, following the notion that direct lines of communication between members help to improve the collaborative efforts of the design team.
It is also worth noting that closeness centrality assumes that the item being passed between the edges follows the shortest path between nodes. Because of this assumption, the correlations found for this metric are most applicable to items that are spread in series, such as CAD models or shared design variables, as opposed to concepts or ideas, which adhere to more of a parallel duplication process.
Betweenness centrality, shown in Figure 4(b), exhibits a negative linear correlation. The correlation between design score and average betweenness exhibits an
-squared value of 40.7%; however, it still has a noticeable effect on design performance.
Higher levels of betweenness centrality are known to indicate critical nodes, as they fall within the information paths of multiple adjacent nodes (Borgatti & Everett 2006). From these results, higher potential design scores have teams consisting of a lower number of critical nodes. This result is promising as teams with better design potential do not rely on any individual node to transfer large flows of information. The removal of any individual node would not have a significant impact on the entirety of the team.
As with closeness centrality, betweenness centrality assumes that information follows the shortest path between nodes. In the context of an open design initiative, this may not always hold true for concepts or ideas, indicating that nodes of high betweenness generally control the flow of specific items as opposed to ideas.
Eigenvector centrality, as shown in Figure 4(c), exhibits a positive correlation to design score with an R-squared value of 56.4%. Higher levels of eigenvector centrality lead to members of the design team having strong influences on other members of the design team. This consideration can also point to ‘group think’, reducing the variety of ideas and limiting innovation, as members of the design team can be persuaded to agree with influential members so as to follow the sentiment of the group.
Unlike closeness and betweenness, eigenvector centrality does not rest on the assumption of shortest path flows. Because of this, the results shown for eigenvector centrality apply to the transfer of ideas and concepts regarding designs. This implies that the observation of a positive correlation supports the diffusion of design concepts within the team.
Degree centrality, shown in Figure 4(d), shows a positive linear correlation between the average degree of the design team and the potential design score, with an
-squared value of 68.3%, indicating a relatively strong relationship.
Higher degree for each individual member, however, increases the amount of information flow each individual must be responsible for. As the degree increases, the level of individual involvement increases as they now receive information from additional members. This metric must be carefully leveraged so as to increase the degree of members where it is most advantageous for the entire network.
Degree centrality is also an indication of the immediate influence of a given node. In the context of a design effort, immediate influence is shown through the direct sharing of design variables or CAD models between two collaborating individuals. Since this metric only considers nodes directly incident upon one another, increased levels of degree centrality lead to more connected designs.
Three discrete values for diameter are observed, as shown in Figure 5(a). Networks of lower diameter provide stronger design potential. It is noted that the top performing design team had the lowest potential diameter while the highest values of network diameter all contained design scores that fell below the average. While diameter does not appear to be a strong indicator of design score, it can be concluded that networks where information channels between members must include multiple intermediary nodes perform worse.
The final network characteristic reviewed is the density of each network. From Figure 5(b), another strong positive linear correlation is observed as teams with greater density tend to generate higher design score potentials.
Figure 5. Network graph characteristics.
The density of a network is in direct relation to the average degree centrality of that network. The overall impact of density on design scores is a result of the same network dynamics as when average degree centrality was considered. However, density is an overall network characteristic while degree is individualized.
A multiple linear regression was also performed with respect to the design score and the four centrality metrics. However, due to the high multicollinearity between predictors, measured by their variance inflation factors (Craney & Surles 2002), the results were excluded from this work.
4.1.3 Top performing network
The top performing design team received a design score of 1415.45. This network was very well connected, with strong connections between members of similar disciplines allowing for increased collaboration efforts, as shown in Figure 6. This network had a density of 0.833, indicating that approximately 83% of the potential connections between nodes were utilized. It also had a diameter of two, allowing for a close flow of information between all members.
As shown in Figure 6, node size is proportional to its degree, as it was previously determined that degree held the highest statistical significance when considering levels of network centrality. The top performing team has multiple nodes of high individual degree, supporting the spread of information within the network.
The colors for each node represent the cluster in which they belong. This network developed two distinct clusters of individuals represented by the pink and green nodes using a community detection algorithm based on a heuristic optimization approach to find high-modularity partitions, outlined in Blondel et al. (2008).
Figure 6. Top performing team network structure.
Table 5 highlights all network metrics attributed to the network graph of the top performing design team. Each of the network parameters outlined falls well above the average values determined from the entire simulation of the 1,000 teams.
Table 5. Network properties of the top performing team
Another consideration regarding the top performing team is related to the number of members from each discipline. This team is composed of a relatively even distribution of members, with the exception of only one marketing specialty and two research members, as shown in Table 6. The wide variety of disciplines allows for a well distributed design effort. The exact combination of individuals leading to the most successful designs would depend on the design task being considered.
Table 6. Individual members on the top performing team
4.1.4 Worst performing network
The worst performing design team received a design score of 526.64. From the network graph shown in Figure 7, it is observed that there were multiple members that had a degree of one or two, indicating that they were partially removed from the design effort. This led to poor information sharing from these members, thus decreasing their design score. This network also had a density of only 0.397, indicating that only approximately 40% of all possible connections were utilized.
When reviewing the network graph for the worst performing team, it is observed that there are fewer nodes of individually high degree, as the average degree for this team was significantly lower. It is also observed that there are four clusters that have formed, one of which is the isolated development engineer represented by the blue node. This lack of connectivity negatively impacts the team’s performance.
Figure 7. Worst performing team network structure.
Table 7 highlights all network metrics attributed to the network graph of the worst performing design team. Each of these network properties presented falls below the average values determined from the complete simulation of the 1,000 teams.
Table 7. Network properties of the worst performing team
When reviewing the worst performing team for the distribution of individual members, it is observed that this team heavily consists of development engineers, with zero quality assurance engineers, as shown in Table 8. Because of this breakdown, the team does not sufficiently capture the entire design process, creating a poor overall design.
Table 8. Individual members on the worst performing team
Comparison of the top performing team with the worst performing team furthers the idea that teams with greater connectivity, increased skill distribution and increased levels of information flow tend to create higher potential design scores. Another characteristic that is significantly different between the top and the worst teams is the experience level and variety of individuals on the team. The top performing team has 20 senior members, indicating greater ability in their respective disciplines, while the worst performing team only has 14 senior members.
4.1.5 Network generation comparisons
Next, we consider the impact of the communication link generation method, considering randomly formed networks, probabilistically guided networks and directed networks.
Figure 8. Network generation method statistics.
As shown in Figure 8(a), the network formed from random trait assignment consistently provided the most effective design team, with the probabilistically guided network receiving the second highest average marks and the directed network performing the worst of all three. These results, however, must be carefully interpreted as the average density, Figure 8(b), indicates that directed networks were also the least connected networks. It was previously identified that there exists a strong positive correlation between design score and network density, potentially leading to the variation in design scores observed.
The decreased overall design scores can be explained by the limited potential for collaboration efforts, expressed through a decreased network density, as the probability of communication links was decreased. The probabilistically guided and directed networks limit the overall amount of potential connections, as these are now dependent on the variety of members within the design team. The limited potential for communication links can also be quantified by the average degree of each network, as it was only 19.3 for the partially random network compared with 28.9 for the fully random network generation. The directed network is the most restrictive as it only allows for individuals with the same background knowledge to communicate. The decreased design score is primarily attributed to the decrease in the probability of collaboration.
Figure 9 illustrates a summary of the network metrics across the three types of network formations. Random trait assignment lead to teams with the highest ability and networks of the greatest density. Partially random networks have the highest levels of betweenness centrality and largest average diameter.
The directed network received average closeness and betweenness values of zero as there were isolated groups within the networks. These isolated groups meant that no member within the network could communicate with any other members.
4.2 Parametric analysis
To further understand the effects of network construction, a parametric analysis is performed to observe the impact of varying levels of team size and communication link generation on design score and network structure characteristics.
As expected, it is observed that the design score increases with increased team size and decreased threshold value, as shown in Figure 10. As the number of individuals on each team increases, along with their probability of communication, shown through decreasing threshold values, the design scores also increase.
4.2.1 Network characteristics
Closeness centrality, shown in Figure 11(a), is studied relative to varying team size at constant lines of threshold values. Threshold values of one allow for networks of much greater density as individuals require only one trait similarity before links are formed, while threshold values of six require individuals to share half of their traits before they collaborate. As the team size increases, the average closeness centrality for each group decreases for threshold values of one and two. When looking at higher threshold values, the closeness centrality remains constant at zero due to incomplete network graphs, with the minor exception of insignificant closeness levels for a threshold of three with team sizes of under 30 members.
Figure 9. Summary of network statistics.
Figure 10. Design scores with 95% confidence intervals.
Focusing on threshold levels of one and two, it can be concluded that network closeness decreases as team size increases. This is a result of increased team size leading to greater geodesic paths between individuals. Due to the addition of team members, the average distance between each node increases as communication between members now spans across a greater number of individuals. This result indicates that as teams are formed with significantly different individuals, their networks become less centralized.
When observing the betweenness centrality, shown in Figure 11(b), it is evident that as team size increases, the betweenness centrality of each network also increases. As additional members are introduced to the design team, connections between members have a greater chance of passing through other members, causing the average betweenness of the entire network to increase. Based on this, caution must be taken when developing networks of large team size as these contain additional critical nodes that control large information flows.
The impact of threshold level illustrates a curious result, as a threshold of three creates the most impactful change across varying team sizes, while a threshold of one has a less profound effect. Lower threshold levels lead to a greater overall probability of developing connections between individuals. Because of this increased probability, we observe that with a threshold of one, the betweenness does not increase as quickly as when looking at thresholds of two, three and four since the increase in connections also supports the development of direct lines of communication. When considering higher thresholds, the probability of an increased number of additional communication links decreases, also decreasing the number of direct paths between individuals, forcing information flows to pass through intermediary nodes.
We observe that at 25 individuals, the betweenness centrality of networks with a threshold of three begins to increase above that found for a threshold of two. This phenomenon can most likely be attributed to the decreasing probability of additional direct lines of communication in higher thresholds, causing networks with a threshold of three to increase betweenness at a greater rate as more individuals are added to the network. It is also observed that thresholds of five and six only exhibit a very minor impact as the probability of new communication links remains low, not significantly impacting each individual’s betweenness level.
Figure 11. Social network metrics with 95% confidence intervals.
An increase in the team size creates a decrease in the eigenvector centrality, as shown in Figure 11(c), for each design team following a decreasing power regression line. Because this correlation follows a power regression, changes to team size for smaller teams have a much greater impact on eigenvector centrality when compared with larger team sizes. When team sizes remain small, eigenvector centrality is high due to the limited pool of potential member connections. At smaller team sizes, the probability of influential members connecting other influential members is higher due to the decreased pool of potential member connections. As team sizes increase, this probability decreases, causing the decrease in eigenvector centrality to a point where adding additional team members creates a negligible effect.
The correlation between average network degree and team size is shown in Figure 11(d). Degree centrality increases linearly with respect to team size as the number of potential connections increases due to the increased probability of other team members sharing the required number of traits for a connection.
Comparison of team size with network diameter, as shown in Figure 12(a), reveals results that do not show much discernible pattern across both team size and communication thresholds. It is also observed that these values had much greater variability, notably limiting any statistically supportable conclusions. One observation that can be noted, although cautiously, is that the average values of diameter appear to increase asymptotically toward a constant. This constant also appears to increase with increasing threshold value. Thresholds of one and two reach their constant values, diameters of 2 and 3 respectively, immediately. A threshold of three requires a team size of approximately 25 before a constant diameter of 4 is reached and a threshold of four reaches a constant value of approximately 5 at a team size of 55. It also appears that the deviation in the results is a function of team size as a threshold of one has a near-zero level of deviation for a team size of 60 or above. Based on this trend, the authors believe that if additional larger teams were considered, all threshold values would begin to settle at a constant diameter with decreasing levels of variability.
Networks generated with thresholds of three and four both follow increasing patterns along a positive power regression. Networks of threshold level three smooth out at a network diameter of four and networks of threshold level four smooth out at a network diameter of five. The cause of the increasing diameter level for smaller team sizes for these two threshold values is due to incomplete network graphs for low team sizes. With a low number of team members and a higher threshold value, isolates and disconnected clusters form, causing network diameters of zero. When taking the average across these networks, the incomplete graphs begin to disappear, as networks become more connected, causing the average diameter to level out around the constant value for each threshold. For example, this phenomenon occurs at 20 individuals for a threshold of three.
For networks with thresholds of five and six, the impact of these incomplete graphs is much more prominent. These values appear to increase linearly with respect to team size; however, they are expected to smooth out, in a similar fashion to thresholds of three and four, as the number of incomplete graphs decrease.
Figure 12. Network graph characteristics with 95% confidence intervals.
When reviewing team size against network density, Figure 12(b), it is observed that there is no discernible effect of team size on network density. As the team size increases, the network density across constant threshold lines remains relatively constant. It is observed that network density decreases significantly with increased threshold level. This is due to the increased potential for communication links when the threshold value is low.
Based on these results, it is preferable to support each design team with additional lines of communication to allow for greater sharing of design activities, especially for teams of greater size. When considering crowdsourced design, this could come in the form of using members with similar traits and complementary abilities. With the traits of individuals being used to generate connections between them, crowdsourced networks would benefit from the combination of individuals who share common interests to support collaboration efforts. Thus, it is advantageous to develop additional modes of communication between members, potentially through increased content sharing or trait matching.
While this work allows for the initial analysis of simulated design teams, there exist further possibilities to extend this simulation framework to generate a more robust and adaptive model, allowing for the theoretical performance validation of the framework. Currently, individual abilities are restricted to broad estimations of their overall competencies. Further understanding of the specific attributes of each individual and how these map to design improvements is required before self-organizing mass collaboration efforts can expand.
Additionally, it is important to note that increased connections between individuals are not penalized, supporting the trend of increased communication between members. In practice, additional communication links can increase time of development and costs. This work reviews a design initiative under constant iteration, similar to open source projects, such that development time is assumed to not significantly impact the design score.