Skip Navigation


Bioscience Horizons Advance Access originally published online on April 8, 2009
Bioscience Horizons 2009 2(2):99-112; doi:10.1093/biohorizons/hzp012
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
2/2/99    most recent
hzp012v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Bellanca, L.
PubMed
Right arrow Articles by Bellanca, L.
Related Collections
Right arrow Cell Biology
Right arrow Educational Research
Right arrow Mathematical Biology
Right arrow Molecular Biology
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Measuring interdisciplinary research: analysis of co-authorship for research staff at the University of York

Leana Bellanca*

Department of Biology, University of York, Heslington, York, North Yorkshire YO10 5YW, UK

* Corresponding author: Email: leana.bellanca{at}googlemail.com

Supervisors: Dr Daniel Franks and Dr Leo Caves, York Centre for Complex Systems Analysis, Department of Biology, PO Box 373, University of York, Heslington, York, North Yorkshire YO10 5YW, UK.


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
Collaboration allows researchers to combine the strength of different disciplines to undertake research that neither could do individually. Scientific collaboration can be examined by analysing patterns of co-authorship of papers in publication databases (e.g. Web of Science) using methods from Social Network Analysis. In this project, I describe three networks consisting of researchers in the Biology and Chemistry Departments at the University of York to investigate degree, degree distribution, key brokers and preference of researchers for collaborating within or outside their own research field. Clustering (or transitivity) was used to describe whether collaboration is more likely if two researchers have a collaborator in common. To introduce a control and realize the significance of the results produced, a network consisting of 98 researchers from the Chemistry and Biology departments was produced and compared with a distribution of 1000 ER random graphs for degree, transitivity and betweenness. We find that researchers in the Department of Biology (50 researchers) have fewer collaborations with their departmental colleagues than those in the Department of Chemistry (45 researchers): the average number of links each researcher had with others in the Biology collaboration network was 2.6, the corresponding values for Chemistry were 4.8 links per researcher. We also find that researchers within the Chemistry department were more likely than their colleagues in Biology to collaborate with another researcher if they had a collaborator in common. One aim of the study was to characterize the extent of interdisciplinary research within the Department of Biology. Staff in the Biology department were categorized into distinct research foci, indicating the discipline of the researcher. There were many links from the Bioinformatics and Mathematics, and Biophysics and Biochemistry foci, to other foci, implying that staff within these foci were interdisciplinary in their research—indicative of their role in providing techniques or tools that are applicable across discipline boundaries. This sort of analysis provides quantitative evidence to understand the social patterns of scientific collaboration and may be a useful tool in the development of strategies to promote interdisciplinary research within research institutions.

Key words: collaboration, social network analysis, degree, random graphs, scientific research


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
Scientific researchers aim to investigate, interpret and revise knowledge of the world. A group of scientists who work together may form a social network.1 Collaboration may be utilized to produce a funding proposal, co-author a scientific paper or share ideas through informal discussion. Collaboration is often used to undertake interdisciplinary research. Interdisciplinary research aims to combine the strengths of several disciplines to create a new discipline allowing researchers to undertake research that neither could do independently. Funding bodies sometimes consider grant proposals from groups of researchers whose research incorporates an interdisciplinary approach to solve large complex problems. Interdisciplinary research bridges gaps in terminology, approach and methodology, generating new approaches to thinking.2

Interdisciplinary Research
Collaboration allow scientists to pool resources to leverage the cost of expensive scientific equipment and trained specialists. The knowledge-based3 view suggests that financial resourcing affects how scientific collaborations form. An organization is likely to develop resources such as expertise and purchase of expensive equipment if the expertise is used regularly. In comparison, if expertise is expensive to develop internally, collaboration with another organization may be beneficial. However, co-ordination costs involved in multi-centre collaborations between scientists may present a barrier to interdisciplinary research. For example, travel costs to attend meetings with overseas collaborators. Frequent communication between collaborators is often associated with greater trust, increased output (i.e. scientific publications) and greater value for money.3 Sharing resources such as a common website or database between collaborators may spread the cost of data handling and communication for each investigator and potentially result in improved, systematic methods and standardized measurements. Collaboration must occur within a work and reward structure largely focused on individual achievement and reputation. The number of publications may be a factor when considering a researcher for promotion.4

Network Theory
A pair of researchers has a relationship if they have collaborated to publish a scientific paper together.5 A graph can be produced to describe collaborations between many pairs of researchers using network theory and bibliometrics. Bibliometrics is the analysis of scientific publication records. Each researcher in a graph is a node connected by an edge or link. Each edge can be weighted to indicate the number of times a researcher pair has collaborated together. A node can be characterized by their degree (k) and attribute6 with hubs being nodes of high degree. The degree is the number of links the node has to other nodes, whereas the attribute is the intrinsic characteristic of the node such as a researcher's research interest. The degree distribution is the number of nodes with a particular degree. A node can be also be characterized by its betweenness. Betweenness measures the number of times a researcher is an intermediary in the path between two researchers.7 A node with high betweenness may identify a researcher as a broker, able to initiate or hinder communication flow through a network.8 Path length is the number of edges between two researchers.9 Links can be directed or undirected. For example, in a directed network, node X is parent to node Y. In an undirected graph, links are undirected implying that each node/collaborator is perceived as equal contributor to the relationship.10

Network Models
The Erdos–Renyi (ER) graph describes a random network that follows a Poisson distribution.3, 10 That is, most nodes have an equal number of edges relative to the network's average degree. Random graphs have a small mean shortest path length and a small clustering coefficient. Clustering coefficient (or transitivity) measures the probability of a researcher pair collaborating if they have a collaborator in common. Shortest mean path length means each researcher can reach every other by a small number of links.11 An advantage of having a small mean shortest path length is that communication can flow through a network quickly. A small world network more accurately reflects a scientific collaboration network than a random graph.1 Small world networks have a small mean shortest path length and a clustering coefficient significantly higher than an ER random graph.9 Thus, in a network graph nodes tend to group together.

Interdisciplinary Research at the University of York
A recent report by the University of York, Information Needs for a World Class University identifies information needs for promotion of interdisciplinary research.12 This report maintains that research at the University of York is moving towards interdisciplinary collaboration. Collaboration becomes necessary as the university competes in the global higher education market. The University of York identified their current information services as a barrier to internal and external collaboration. Simpler access to data and communication between researchers is needed to facilitate interdisciplinary research. An academic social networking site, currently being developed by the TRANSIT initiative13 at the University of York will allow a researcher to undertake projects and input data into a database without multiple points of entry.

Research Aims
In this project, I describe the scientific collaboration networks in the Biology and Chemistry departments at the University of York. I define scientific collaboration as two researchers having co-authored a scientific publication together from 2001 to 2007.8 Three networks were considered to investigate degree, degree distribution, key brokers and preference of researchers for collaborating within or outside their own research field. Clustering (or transitivity) was used to investigate whether collaboration is more likely if two researchers have a collaborator in common and if clustering had any effect on the structure of the social network of scientists. A network consisting of 98 researchers from the Chemistry and Biology departments was produced to test the hypothesis that collaboration does not differ from a random distribution of 1000 ER random graphs for measures of degree, transitivity and betweenness.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
Publication data for researchers employed at the University of York were extracted from records contained in Web of Science,14 from the Biology Research Support office and Chemistry department. UCINET,15 a social network analysis tool, was used to statistically analyse relational data. Bibexcel16 was used to filter publication data to produce a .coc file that detailed the number of times a pair of researchers had collaborated together. A .coc file was modified slightly in Microsoft Excel17 to produce a UCINET input file, .dl. Network analyses considered three networks separately:

  1. Biology collaboration network;
  2. Chemistry collaboration network;
  3. Biology and Chemistry collaboration network.

Compiling Researcher Lists
A list of 85 Biology, 62 Chemistry department researchers and a combined list of 147 Biology and Chemistry researchers at the University of York was compiled from 2001 to 2007.18, 19 Researchers who did not collaborate with their departmental colleagues were not included in the network graphs for the three networks subsequently produced. Fifty out of 85 researchers collaborated with others in the Biology department, 45 out of 62 collaborated with others in the Chemistry department and 98 out of the 147 researchers collaborated with others in the Chemistry and Biology departments. The period 2001–2007 was used because it was when the last RAE at the University of York was conducted. The three researcher lists for the three networks included surname, initial of first and sometimes second name. Researcher lists included independent research fellows, lecturers and professors and excluded PhD students, visiting fellows and teaching fellows. For this article, researchers were designated a number from 1 to 147 for anonymity.

Biology and Chemistry Publication Data
ISI Web of Science scientific publication database was used14 to mine journal publication records for researchers in the Biology department. The following exemplifies parameters used in the search page:

Location: Univ* York

  Years: 2001–2007.

Publication data mined from Web of Science for Biology researchers was found to be inaccurate. Difficulty was experienced in identifying researchers from the Biology department at the University of York. On many occasions, there were a number of researchers with the same initials and surname employed at the University of York. Therefore, Biology researcher publication data from the Biology Research Support Office was used because publication records had been verified by individual authors, whereas data from Web of Science had not. Duplicate publication records were removed from the Biology Research Support Office data by Endnote20 and data exported to Bibexcel. Data from the Biology Research Support Office included publications from journal publications, books, conference proceedings, newspapers, patent, personal communication, report and generic (all types of publication). It was thought that including all types of publication may have affected the number of collaborations between researchers in the three networks and the frequency at which they occur (shown by the weight of an edge in a network graph).

Web of Science was not used to analyse the Chemistry collaboration network because of inaccuracies found in the data for the Biology collaboration network. Chemistry publication data were received from the Chemistry department unformatted with each publication containing researchers' names, title, abstract, journal reference, etc. in a Microsoft Word document. Unformatted data were parsed in Awk21 by Caves22 to produce a Bibexcel output file, .out containing researcher name and publication number. All types of publication were included when the Chemistry collaboration network was produced.

UCINET: Social Network Analysis Tool
UCINET produced a square adjacency matrix that detailed the number of times a pair of researchers had collaborated together.23 A .dl was used as an input format for UCINET and is similar to a Bibexcel .coc file.

A .dl file consisted of a data set preceded by the header:

dl n = 50 format = edgelist1 alpha = no

   labels embedded data:

n was the number of nodes in the network and Edgelist 1 specified that the data were textual. Labels were embedded because UCINET drew the labels for the rows and columns of the adjacency matrix from the first two fields in each data record. The .dl data set was imported into UCINET and saved as a REAL data set. REAL data sets are flexible because they contained values that ranged from –1E36 to +1E36. A .dl file can be visualized using the network graph visualization package, Netdraw distributed with UCINET.

Binary, Weighted and Symmetric Data
Symmetrical data represented each researcher as an equal contributor to the collaboration. Data were made symmetric in UCINET using the symmetrize function from the Transform menu. Binary data are where a relationship between a researcher pair is coded 1 and no relationship is coded 0. Weighted data are where a number is assigned to each edge to indicate the number of publications for a researcher pair. Weighted data were used in some measures of collaboration when the Chemistry and Biology networks were analysed separately.

Describing the Characteristics of Each Node: Assigning a Research Specialism to Each Node
Research foci classification data for each researcher in the Biology collaboration network were provided by the Biology Research Support Office. Analysis of research foci for the Chemistry collaboration network was not conducted because chemistry research foci data were difficult to obtain. Each researcher in the Biology collaboration was characterized by its research field using Netdraw.24 Transform > node attribute editor allowed the user to insert a column to describe each node.

Generating a Random Graph
A graph of 98 researchers from a potential 147 was produced using steps detailed in the Biology and Chemistry Publication Data, UCINET: Social Network Analysis Tool, and Binary, Weighted and Symmetric Data sections. Researchers who did not collaborate with their Biology and Chemistry colleagues were not included in the graph. These observed data were compared with 1000 ER random graphs generated using the random function from the ‘data’ menu in UCINET. The following parameters were used to generate the ER random graphs and were the same as the Chemistry and Biology network data.

Density: 0.0406 (binary and undirected)

  Number of nodes: 98

  Number of graphs: 1000

  Type of graph: Undirected

  Data: Binary.

Binary data were used to produce the Chemistry and Biology collaboration network to allow direct comparison with a frequency distribution of 1000 ER graphs. A frequency distribution was generated in Excel.


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
Accuracy of Data
To illustrate inaccuracies found in the Biology collaboration network, weighted network data mined from Web of Science (journal publications only) are shown in Fig. 1, and weighted data from the Biology Research Support Office (journal publications only) (Fig. 2). Figure 1 has 45 researchers compared with 50 in Fig. 3, indicative of discrepancies between the two data sets.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. The Biology collaboration network using journal publication data from Web of Science, 2001–2007. Data are weighted and undirected. There are 45 nodes. The number of publications between a pair of researchers ranges from 0 to 26. A thick line indicates the researcher pair has collaborated together many times, and a thin line means collaboration has occurred infrequently.

 


Figure 2
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. The Biology collaboration network using journal publication data from Biology Research Support Office, 2001–2007. Data are weighted and undirected. There are 50 nodes. The number of publications between a pair of researchers ranges from 0 to 15. A thick line indicates the researcher pair has collaborated together many times, and a thin line means collaboration has occurred infrequently.

 


Figure 3
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. The Biology collaboration network using publication data from Biology Research Support Office, 2001–2007. All types of publication used. Data are weighted and undirected. There are 50 nodes with five components; one large and four small. A thick line indicates the researcher pair has collaborated together many times, and a thin line means collaboration has occurred infrequently.

 
The Biology collaboration network consisting of 50 researchers is shown in Fig. 3 where edges are weighted and all types of publication are used from data from the Biology Research Support Office. The weight of the edges is affected by the type of publication used. For example, researchers 97 and 110 collaborated 20 times in Fig. 3 compared with 15 times in Fig. 2 where only journal publications were used.

A random sample of six Biology researchers was taken from the Web of Science data set and compared with the data set from Biology Research Support Office shown in Table 1. Table 1 implies a tendency to overestimate the number of publications in the Web of Science data set—a false positive.


View this table:
[in this window]
[in a new window]

 
Table 1. Researcher degree, number of publication records and percentage difference between a random sample of six Biology researchers

 
Chemistry Collaboration Network
Publication data were used to produce a weighted chemistry collaboration network (Fig. 4). Forty-five out of 62 researchers (72%) collaborated with other researchers in the Chemistry department compared with 59% in the Biology department when both data sets used all types of publications and weighted data. Figure 4 has one giant and one small component. Within the giant component, two sub-components are evident connected by researchers 6, 26 and 43. The strongest links are between researchers 37 and 10 and researchers 33 and 61 with 88 and 101 publications, respectively. This suggests these researchers had an intense relationship for a short period of time and produced many papers or a consistent relationship over a long period of time.


Figure 4
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. The Chemistry collaboration network using publication data from the Chemistry department, 2001–2007. All types of publication used. Data used are weighted and undirected. Forty-five researchers from a possible 62 collaborated with their chemistry colleagues. The weight of the edge is indicated by the thickness of the line, with a thick line indicating many collaborations and a thin line a few collaborations together.

 
Researcher Degree
The degree is the number of ties a node has to other nodes.10 Researchers are likely to influence those they are directly adjacent to.25 The patterns of ties in a network define an researcher's social role and position.6, 7. Nieminen's measure (equation 1)26 was used to express the number of nodes adjacent to a point, Pk where Pi and Pk represent two nodes:


Formula 012M1

(1)
where, a(Pi, Pk) = 1 if and only if Pi and Pk are connected by a line, otherwise 0.

The CD(Pk) is confounded by network size because it is an absolute count of degree. To compare the relative centrality of points for Chemistry and Biology network graphs, CD(Pk) was normalized. Tables 2 and 3 show the degree and normalized degree for Chemistry and Biology Collaboration networks, respectively, using weighted undirected data. Table 4 shows that the average number of links each researcher has with others in the Biology collaboration network was 2.6, the corresponding values for Chemistry were 4.8 links per researcher.


View this table:
[in this window]
[in a new window]

 
Table 2. Degree and normalized degree for 45 authors in the Chemistry collaboration network

 


View this table:
[in this window]
[in a new window]

 
Table 3. Degree and normalized degree for 50 authors in the Biology collaboration network

 


View this table:
[in this window]
[in a new window]

 
Table 4. Degree and normalized degree summary statistic table for Biology and Chemistry authors

 
Table 2 shows that researchers 6 and 16 had a normalized degree of 27.273 and 25.000, respectively, occupying central positions in the Chemistry collaboration network. Seven researchers had a normalized degree of 2.273, occupying a peripheral position in the network.

Table 3 shows researchers 88 and 64 have the highest normalized degree of 16.327 and 14.286, respectively, suggesting that they occupy a central position in the Biology collaboration network and are able to influence whether collaboration takes place. Conversely, 18 researchers have a normalized degree of 2.041. This suggests that these researchers do not occupy a peripheral position in the network and are unlikely to influence potential for collaboration.

Normalized degree distribution measures the number of researchers with a particular degree and describes structural properties of the network. Figure 5 shows the degree distribution of the Biology and Chemistry collaboration networks using binary data.


Figure 5
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Degree distribution for (A) the Biology Collaboration network and (B) the Chemistry collaboration network. Most researchers in the Biology collaboration network (18 researchers) are connected to few individuals and have a degree of 1 whereas eight researchers have a large degree (5–8) and are highly connected to others. The Chemistry collaboration network has six researchers with a degree of 1 and one researcher with a degree of 12. Twenty-six researchers have a degree of 5–12. To summarize, the Chemistry collaboration network has more researchers with higher degree who occupy a central position in the network than the Biology collaboration network.

 
Betweenness
A researcher is also central to a social network if they are an intermediary node between the paths of others.26 A researcher will pursue all pathways that are independent of each other (maximum flow) to collaborate with another proportionally to the length of the pathways.27 Flow betweenness measures the number of times a researcher lies on all paths between another researcher pair. Flow betweenness can be expressed by equation 2:7


Formula 012M2

(2)
where, mjk is the maximum flow from nodes xj to xk and mjk(xi) the maximum flow from xj to xk that passes through node xi.

Flow betweenness increases with network size and density because it is an absolute count, preventing comparison between two network data sets. Alternatively, normalized flow-betweenness can be used to compare betweenness of Biology and Chemistry Collaboration networks shown in equation 3, 7 by dividing flow that passes through xi by the total flow where xi does not participate in communication.


Formula 012M3

(3)

Researchers in the Biology and Chemistry Collaboration networks with high normalized flow betweenness can be considered brokers. That is, researchers who have power to initiate or prevent collaboration between another pair of researchers.11 Tables 5 and 6 describe normalized betweenness using UCINET's Flow Betweenness algorithm and weighted data for Chemistry and Biology collaboration networks, respectively.


View this table:
[in this window]
[in a new window]

 
Table 5. Normalized weighted flow betweenness for 45 researchers in the Chemistry collaboration network

 


View this table:
[in this window]
[in a new window]

 
Table 6. Normalized weighted flow betweenness for 50 researchers in the Biology collaboration network

 
Researchers 88, 67 and 113 were the top three brokers with normalized flow betweenness of 19.969, 12.822 and 11.476, respectively, in the Biology collaboration network shown in Table 6. Researcher 67 influences communication flow in the network but is not the most connected author. Researcher 67 has a normalized degree of 10.204 compared with a minimum of 2.041 and maximum 16.327. Eighteen of the 50 researchers have a normalized flow betweenness of zero indicating they are not intermediaries in the flow of communication between two researchers.

Researchers 21, 6 and 26 are the top three brokers in the Chemistry collaboration network with normalized flow betweenness values of 19.095, 16.254 and 14.095, respectively. Researcher 26 is one of the nodes that connect the two sub-compartments of the giant component and has an average normalized degree of 6.818 compared with a minimum of 2.273 and maximum of 27.273. This implies that researcher 26 can influence communication flow without being a highly connected researcher. Researcher 26 is connected to two highly connected researchers, 16 and 6, who have a normalized degree of 25.000 and 27.273, respectively. Seven of the 45 researchers in the Chemistry collaboration network have a normalized flow betweenness of 0.00. This implies that intermediaries in the Chemistry collaboration network have greater potential to hinder or promote communication compared with the Biology collaboration network.

Research Foci
Is research focus a meaningful way to classify authors? Analysis of the normalized degree of a researcher to researchers within the same or different research foci can describe this. Classification of each node into one of the nine research foci may reveal whether a researcher tends to collaborate with people within their own research foci or different. This is shown in Fig. 6 using the Biology collaboration network data. Figures 7 and 8 show normalized degree within and between research foci, respectively. Figure 8 shows that Bioinformatics and Mathematics has a normalized between foci degree of 0.17 compared with a within normalized degree of 0.06. This implies that Bioinformatics and Mathematics authors collaborate more with authors of other foci than themselves. Figure 9 summarizes the number of links between research foci on a network graph.


Figure 6
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6. Classification of the Biology collaboration network by research foci. Relationships are undirected and binary. All types of publication were used. The key details each author by research focus.

 


Figure 7
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7. Normalized number of links within each research foci in the Biology department at the University of York. Data are binary and undirected. The normalized degree within a foci ranges from 0 to 0.29 with a mean of 0.14. The Ecology and Evolution has the highest normalized degree of 0.29. Bioinformatics has the lowest normalized degree within foci of 0.06. This implies that Ecology and Evolution is well connected within their own foci and Bioinformatics is poorly connected. However, Ecology and Evolution (population 9) has increased opportunity to collaborate within their own foci than Bioinformatics (population 2).

 


Figure 8
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 8. Normalized number of links between each research foci in the Biology department at the University of York. Data are binary and undirected. The normalized between foci degree ranges from 0.13 to 0.25. Biochemistry and Biophysics has the highest normalized between degree of 0.25. Conversely, Molecular and Cellular Medicine has the lowest between normalized degree of 0.04. Molecular and Cellular Medicine authors collaborate to a lesser extent with other foci.

 


Figure 9
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 9. Number of times collaboration has occurred between authors of different foci. Data are undirected.

 
Normalized flow betweenness can be used to measure whether a focus can promote or hinder communication flow. If normalized flow betweenness using weighted data is summed for all authors in a focus and divided by the number of authors, the average normalized flow betweenness can be calculated. Figure 10 shows the average normalized flow betweenness between foci. The data suggest that the two foci, Ecology and Evolution and Bioinformatics and Mathematics, with normalized flow betweenness values of 5.48 and 5.40, respectively, are influential in promoting or inhibiting potential for collaboration, indicating that they may act as brokers in the network. Molecular Microbiology and Molecular and Cellular Medicine have little influence in promoting collaboration with normalized flow betweenness values of 0 and 0.02, respectively. All five authors belonging to the Molecular and Cellular Medicine group are disconnected from the main component of the graph, indicating that they either do not collaborate with others or collaborate with researchers outside of the Biology department.


Figure 10
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 10. Average normalized flow betweenness values between research foci. Weighted undirected data are used. The Ecology and Evolution and Bioinformatics and Mathematics are brokers in the Biology collaboration network. The normalized flow betweenness for Bioinformatics and Mathematics and Ecology and Evolution are 5.397833 and 5.479818, respectively. The normalized flow betweenness for Molecular Microbiology and Molecular and Cellular Medicine is 0.0 and 0.014, respectively.

 
Clustering
Is There Evidence of Clustering in the Biology and Chemistry Collaboration Networks?
High transitivity or clustering means that there is a heightened probability of two people having collaborated if they have one or more collaborators in common.11 Transitivity can be measured by the clustering coefficient described by equation 4.9


Formula 012M4

(4)

The Biology collaboration network shows little evidence of clustering since only 0.09% of all types of triples are transitive (Table 7). The Biology collaboration network consisted of 102 transitive triples from a possible of 117 600 triples of all kinds The Biology collaboration network has a density of 0.05%, making collaboration between two researchers who have a collaborator in common unlikely.


View this table:
[in this window]
[in a new window]

 
Table 7. Transitivity of the Biology and Chemistry collaboration networks

 
In comparison, the Chemistry collaboration network shows 624 transitive triples from a possible 85 140 triples of all kinds. The Chemistry collaboration network has a higher percentage of transitive triples (0.72%) compared with Biology (0.09%). The Chemistry collaboration network has a density of 0.1%. Two authors in the Chemistry collaboration network are more likely to publish if they have collaborator in common than the Biology collaboration network.

Chemistry and Biology Collaboration Network ER Random Graphs
Erdos and Renyi9 proposed the Gn,p random graph to describe the random occurrence of a collection of nodes connected by edges.11, 28 Each pair of nodes in a Gn,p graph is connected together with independent probability p or not connected with probability 1 – p.

The observed network is a binary network consisting of researchers from the Biology and Chemistry departments.29, 30 The density and number of nodes in the observed data and 1000 Gn,p graphs do not differ. If the average of a measure such as transitivity, flow betweenness and degree in observed data does not deviate from the average of 1000 random graphs, when random graphs are plotted on a frequency distribution, it is implied collaboration is a random process.29, 31, 32 Figure 11 displays a network graph of the observed data and Fig. 12 is a frequency distribution that shows the average flow betweenness for all nodes in each random graph for 1000 ER random graphs.


Figure 11
View larger version (52K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 11. Network graph using data from Biology and Chemistry publication records. Data are binary and undirected and all types of publication data were used. There are three components disconnected from the giant component. The Chemistry and Biology collaboration network combined has 98 nodes and a density (matrix average) of 0.0406.

 


Figure 12
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 12. Frequency distribution of 1000 ER random graphs measuring flow betweenness. The graph peaks at the 112–114 interval with 241/1000 having an average flow betweenness in this range. The 130–132 and 97–99 intervals had the lowest frequency of 2 and 3, respectively. The shape of the line deviates slightly from the Gn,p frequency distribution with a sharp peak at 112–114 opposed to a rounded peak. This may be because too few data points were used.

 
The average flow betweenness for 1000 ER random graphs was 114.363 compared with 130.966 in the observed network data. The observed average flow betweenness was above the 95th (>127) percentile in the frequency distribution of 1000 random graphs; therefore, average flow betweenness occurs more often in observed data than at random. Flow betweenness is not a random process that occurs in the Biology and Chemistry scientific collaboration network.

If our observed data for node degree approximate a random graph, each node in the observed data will have a degree similar to the network graph's average. Few nodes will have a degree higher or lower than the graphs's average. The average degree (3.939) for 1000 ER graphs and observed data did not differ. This is because the number of edges each node is connected to may vary in 1 Gn,p replication, but the average number of edges per node in each graph does not differ from the other 999 Gn,p replications.32

Transitivity shows that two authors are likely to collaborate if they have an acquaintance in common.9 The percentage of transitive triples for the random and observed networks is <0.01% and 0.09%, respectively. Thus, transitivity in the observed data occurs more often than at random. It is likely that the process of introducing colleagues to one another is important in the community structure of the Biology and Chemistry collaboration network.


    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
With knowledge of the structure of the Biology and Chemistry collaboration network, through co-publication of scientific papers, what have we learnt about interdisciplinary research and scientific collaboration? The results give support to the idea that authors collaborate in a non-random way, where measures of transitivity and betweenness in the observed co-authorship network deviate from a frequency distribution of 1000 ER random graphs. Authors make informed decisions about who to collaborate with. Critique of the assumptions underlying the measures used to analyse the scientific collaboration network is needed to realize the significance of the results. Is the structure of the Biology and Chemistry networks, reported by this work, replicable in other networks? Data mining methods used to analyse publication data from the Biology and Chemistry departments and the problems of missing data may have skewed results providing an inaccurate representation of the Biology and Chemistry collaboration networks.

Data Mining and Sampling Bias
A tie between two authors occurs when they have published a scientific paper together. A tie is a crude measure of collaboration since not all collaborators may be included in the author list.33 Gift authorship is when an author is included in a publication and has made no contribution to research but has influenced the career of a main author.34 Conversely, an individual may not be included in the author list but made significant contributions to research efforts.

The use of co-publication to represent a tie has repercussions when sampling ties in Biology and Chemistry collaboration networks. Bias is created when author lists are incomplete. Authors employed for part of the 2001–2007 periods are less likely to collaborate with many of their colleagues than an author who has worked in the department for the full 2001–2007 term. The effects of missing data are unknown since we are unable to compare the observed network data with a data set that includes all research fellows, professors or lecturers. The Biology and Chemistry networks have different numbers of links, despite having similar population sizes. The Chemistry collaboration network had a density of 0.11% and had 45 nodes compared with a density of 0.05% in the Biology collaboration network with 50 nodes. Therefore, caution should be exercised when making comparisons between the two networks. Constructing a scientific collaboration network is a time-consuming process, making it difficult to produce a number of replicates for each network.31

The boundary specification problem35 refers to inclusion parameters for authors in social networks. Authors who collaborated with others outside the Biology and Chemistry departments at the University of York were not recorded in the data sets. Thus, we cannot comment on the total degree of an author only the degree in relation to other authors in the Biology and Chemistry department.

Random Graphs
Comparison of measures for observed network data with a random distribution of the measure allows us to introduce a control. If we did not compare our data with a random graph, we would not know whether observed data were a random occurrence or not. One thousand replications of the ER graph is adequate to investigate whether the Biology and Chemistry collaboration network, detailed in Fig. 12, does not differ from a random distribution. However, we cannot statistically infer whether the Biology and Chemistry collaboration network deviates or not significantly from the random model because of the small sample size.36 Thus, we have only carried out descriptive analysis of the networks. Although rich in information, generalizations about social relationships cannot be made.

A problem of comparing social network data with a random graph is that random graphs poorly reflect social networks. The Biology and Chemistry collaboration networks violate the independence assumption of Gn,p graphs because collaboration networks demonstrate transitivity where nodes are dependent on others: a property that ER random graphs lack.28 However, given the tools available to compare observed network data with a null model, the ER random model was adequate. Similarly, care should be taken when attempting to fit network data to a network model such as small world due to small sample size.


    Conclusions
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
We conclude that scientific collaborators, quantified using the Biology and Chemistry collaboration networks, seek expertise from their colleagues in a non-random way to fulfil a research goal. Highly connected authors have potential to influence whether collaboration occurs or not. Some authors may not be well connected but influence collaboration by acting as a communication point between two highly connected authors. Publication records have provided a documented source of information about professional relationships among researchers. Future directions could include investigating how scientific collaboration changes over time in the Chemistry and Biology department. Will the same authors be identified as brokers or highly connected in 4 years time? Similarly, the change, if any, of foci membership over time could be investigated and whether this affects practice of interdisciplinary research.


    Funding
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
Funding to undertake this research was provided by Department of Biology, University of York.


    Acknowledgements
 
The author is grateful to Dr Daniel Franks and Dr Leo Caves for providing feedback.


    References
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 

  1. Watts DJ, Strogatz SH, Steven H. Collective dynamics of ‘small-world’ networks. Nature (1998) 390:440–442.[CrossRef][Web of Science]
  2. NIH. http://nihroadmap.nih.gov/interdisciplinary (26 February 2008).
  3. Birnholtz JP. What does it mean to be an author? The intersection of credit, contribution, and collaboration in science. J Am Soc Inf Sci Technol (2006) 57:1758–70.[CrossRef]
  4. Cummings J, Kiesler S. Coordination costs and project outcomes in multi-university collaborations. Res Policy (2007) 36:1620.[CrossRef]
  5. Newman MEJ. Scientific collaboration network 1. Network construction and fundamental results. Phys Rev E (2001) 64:016131.[CrossRef]
  6. Knoke D, Kuklinski JH. Network Analysis (1983) 2nd ed. Thousand Oaks: Sage University Paper.
  7. Freeman LC, Borgatti SP, White DR. Centrality in valued graphs: a measure of betweenness based on network flow. Soc Networks (1991) 13:141–154.[CrossRef][Web of Science]
  8. Newman MEJ. Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA (2004) 101:5200–5205.[Abstract/Free Full Text]
  9. Newman MEJ, Barabasi AL, Watts DJ. The Structure and Dynamics of Networks (2006) Princeton NJ: Princeton University Press.
  10. Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet (2004) 5:101–113.[CrossRef][Web of Science][Medline]
  11. Newman MEJ. The structure of scientific collaboration networks. Proc Natl Acad Sci USA (2001) 98:404–409.[Abstract/Free Full Text]
  12. Sheldon T, Canning AM, Demaine R, et al. Information Needs for a World Class University (2008) York: University of York, Information strategy group.
  13. www.transit.york.ac.uk (15 November 2007).
  14. http://isiknowledge.com/wos (10 November 2007).
  15. Borgatti SP, Everett MG, Freeman LC. Ucinet 6.0 Version 6.175 (2002) Harvard, MA: Analytic Technologies.
  16. Persson O. Bibexcel, Bibliometric Tool (2007) Version 2007-10-30. Inforsk. http://www.umu.se/inforsk/Bibexcel/.
  17. Microsoft Excel. (2007).
  18. http://bioltfws1.york.ac.uk/biostaff/staff.php (29 October 2007).
  19. http://www.york.ac.uk/depts/chem/staff/staff.html (3 January 2008).
  20. Endnote Reference Manager. http://www.endnote.com/ (5 January 2008).
  21. Aho AV, Kernighan BW, Weinberger PJ. The Awk Programming Language (1988) Reading, MA: Addison-Wesley Publication Company.
  22. Caves L. Personal Communication, March 3, 2008.
  23. Borgatti SP, Freeman LC, Everett MG. UCINET 5 for Windows, Software for Social Network Analysis, User Guide (2005) Harvard, MA: Analytic Technologies.
  24. Borgatti SP. A Brief Guide to Netdraw (2008) Harvard, MA: Analytic Technologies.
  25. Wellman B, Berkowitz SD. Social Structures: A Network Approach (1988) 1st ed. New York: Cambridge University Press.
  26. Freeman LC. Centrality in social networks: conceptual clarification. Soc Networks (1979) 1:215–239.[CrossRef][Web of Science]
  27. Hanneman R, Riddle M. Introduction to Social Network Methods (2005) Riverside, CA: University of California.
  28. Newman MEJ, Strogatz SH, Watts DJ. Random graphs with arbitrary degree distributions. Phys Rev E (2001) 64:026118.[CrossRef]
  29. Manly BFJ. Randomization, Bootstrap and Monte Carlo Methods in Biology (1997) 2nd ed. London: Chapman & Hall.
  30. Robins G, Pattison P, Kalish Y, et al. An introduction to exponential random graph (p*) models for social networks. Soc Networks (2007) 29:173–191.[CrossRef][Web of Science]
  31. James R, Croft D, Krause J. Potential banana skins in animal social network analysis. Behav Ecol Sociobiol (2009) in press.
  32. Newman MEJ. Random graphs as models of networks. In: Handbook of Graphs and Networks—Bornholdt S, Schuster HG, eds. (2003) Berlin: Wiley VCH.
  33. Cluxton LD. Scientific authorship, Part 2. History, recurring issues, practices and guidelines. Mutat Res (2004) 589:31–45.[Web of Science]
  34. Fuchs S. The Professional Quest for Truth: A Social Theory of Science and Knowledge (1992) Albany, NY: State University of New York Press.
  35. Kossinets G. Effects of missing data in social networks. Soc Networks (2006) 28:247–268.[CrossRef][Web of Science]
  36. Coolican H. Research Methods and Statistics in Psychology (2004) 4th ed. London: Hodder Arnold.

    Author Biography 
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Funding
 References
 Author Biography 
 
    Leana Bellanca studied for a BSc (Hons) degree in Molecular Cell Biology at the University of York. She developed interests in Systems Biology, including network theory and relational data analysis. Leana is also interested in Immunology and Epidemiology. She is employed as a Medical Information Officer at Professional Information, Richmond, North Yorkshire. This involves drug safety reporting and providing information about medicinal products to healthcare professionals and patients. Leana intends to remain in this field of work.
Submitted on 30 September 2008; accepted on 18 December 2008


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
2/2/99    most recent
hzp012v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Bellanca, L.
PubMed
Right arrow Articles by Bellanca, L.
Related Collections
Right arrow Cell Biology
Right arrow Educational Research
Right arrow Mathematical Biology
Right arrow Molecular Biology
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?