Monthly Archives: September 2009

HALI Project Presentation from the GL-CRSP End of Program Conference

The Co-Principal Investigators of the Health for Animals and Livelihood Improvement (HALI) project, the mothership to which my calf study belongs, gave a really nice overview of the HALI project’s background, goals, and preliminary results this summer at the Global Livestock CRSP‘s End of Program Conference “From Problem Models to Solutions” on June 17th 2009 in Naivasha Kenya.   The Co-PIs, Professors Jon Erickson of the University of Vermont and Rudovick Kazwala of the Sokoine University of Agriculture, have been very helpful during the course of my study, and it was great to see them in Naivasha.  As thanks (though Jon will probably kill me for it), I uploaded their slides and the audio I captured in Naivasha to slideshare as a SlideCast, a nice interactive tool allowing you to sit on your sofa at home with a cool glass of Chimay, and absorb the intrigue of academic lectures.  Sweet memories of the university are nurutured by each click of the slideshow…. Plus, Kaz give me a shout-out on Slide 27: “…one master’s student from UC Davis, which is David here, [pointing at me as I acknowledge the audience and nod over the recording equipment],” so you gotta check it out!

Now, without further ado, allow Drs. Erickson and Kazwala to present:

“The One Health Approach to Solve Complex Problems and Improve Livelihoods at the Human-Livestock-Wildlife Interface”

Questions from the audience are almost unintelligable in the audio, so I listed them here below.  The full presentation will be included as an article in the upcoming GL-CRSP End of Program Conference Proceedings, edited by yours truly, to be released in early 2010…


What did you mean by the environment, as to where the diseases come from?  Why?

The environment largely concerns water, but also other vectors, like flies, wildlife and so on.  As to why we are seeing a resurgence of disease due to a water scarcity, we need to consider the wetlands, which serve as a sponge. If the wetlands were not there, the water would flow out and dry up the entire ecosystem.  During the dry season the water slowly trickles out of the wetlands and provides water for the ecosystem.  And so one of the driving factors is the effect of grazing pressure on compromising these wetlands.  After the removal of pastoralists from these areas, they’ve seen a rebound in wetlands and also in water provision during the dry season.

Given evidence for the linkages, how do you propose to tackle them?

We propose to tackle them through a One Medicine [One Health] approach.  The concept is to create a bridge across the three populations: veterinary teams, medical teams, and other teams integrated to deal with the questions.  The diseases [zoonotic diseases] in the lab are all the same diseases.  Teams need to work in the same environment and in the same lab on the same diseases.  We need better integration and common interest.  This is the case of the One Health approach.  Other things to look at are landscape and bio-regulatory function.  Eco system services for example are very critical in this role, and water and health are very intertwined at the landscape scale.

Map Skills…

Remember these?

Map Skills were intense in 3rd grade.  Workbooks, worksheets, new worlds.  Though I’ve colored-in many maps in my day, I never thought I’d have the skills to make my own.   But alas, today I waved my nerd flag and created a series of maps for the study, complete with projections from the old Geographic Coordinate System (WGS 1984 World) to the UTM WGS 1984 36S projection specific to Central Tanzania.  “Damn that’s dorky,” says my inner child, and yes indeed, working with ArcGIS is dorky.  Perhaps the dorkiest.  But it makes pretty maps, and ones that you can analyze.  So next time you need to find out which shopping centers near you have registered cases of children’s diarrhea and are 100 yards from a public restroom, or other recipes conducive to shitting your pants, you just let me know…

Diarrhea and Protozoan HHs

An example map of households containing calves with clinical diarrhea and identified protozoan infection (Crypto and/or Giardia) – shitty resolution so ya’ll can’t steal it before it’s published. Shame on you Chris Bond…

Data Analysis Part 1: Database Management, Distributions, and Bivariates

Binary Wrapping Paper available at “Think Geek!”

This is part 1 of how to analyze data, for those who have never analyzed data…

Part 1: Database Management, Distributions (aka Descriptive Statistics), and Bivariate Analysis

If you have never created, organized, or utilized data from a database, analysis will truly suck for you.  If you have experience working with and organizing data in Microsoft Excel or Access, or some other database program like FileMakerPro, then it will just suck.

Step 1: Filter and Clean your data. This is not a one-stop shop, but an iterative process that will evolve along with your understanding and appreciation for your data or lack thereof.  All aspects of your field work, be they survey responses, laboratory results, observations, etc, must be evaluated to determine the most logical way of representing the information.  Do not take advice from others on this process, unless these “others” have looked at your data and marginally begun the iterative process with you.  Only you will ever look at your database, work with your database, and wish you had never created it.  That said, do not expect to organize your information, say in tabular format, and then quickly proceed to analysis.  It literally takes multiple attempts to develop a structure that logically extends into the analysis process.  Binary wrapping paper (see above) is an example of how not to organize your data.  I for example, assigned binary values to all variables prior to deeply analyzing the distribution of data and determining if in fact, a binary bivariate analysis of categorical data was what I needed.  I have personally re-developed my database three times: binary; non-binary; and finally a combination of binary and non-binary variables determined by a deep analysis of variable distribution, logical assumption, and intended utilization.  This third component was only possible after multiple phases of trial and error with the Bivariate analysis, Step 3.  Each reorganization of the database was deeply painful.  I have never walked aimlessly around a room out of sheer desperate boredom in my life.  It is like prison and solitary confinement and can drive you mad unless you have interpersonal interaction.  Do this process in the company of others to feel less crazy.  Get exercise, try not to smoke.  Keep lots of whiskey handy.  Drink heavily after pushing “save.”

Get a better office.

Step 2, Distributions. So you think your database is ready eh?  Well OK. Let’s find out.  Do you know statistics?  If yes, good for you.  Benjamin Disraeli was quoted once as saying statistics are a form of lying: “lies, damned lies, and statistics.”  You will quickly learn why.  But first, you will quickly learn why you don’t need to know statistics, and therefore why you have no idea what is happening inside your computer with your data.  Thus, you won’t be able to explain your data mathematically, and will be forced to simply accept that your computer has done its job, and that your statistical results are not lies, though you have no idea why.  Quite like a creationist’s explanation of Genesis: “The bible said so, so it is, I have dominion.”  No explanation necessary.  Yet science is not dogmatic, because we’re statistically less crazy than creationists.

Anyway, I use JMP Statistical Software from the SAS Institute.  JMP is a menu driven program that is very visual and requires no previous experience with code languages.  Why did I choose JMP?  To maximize profit.  JMP is cheap at UC Davis.  Less than $30.  It must be an incentive program to get young scientists addicted to JMP, so when they graduate they pay the full amount for a professional license.  Like handing out crack for free on the corner, knowing they’ll pay premium once they’re hooked, like this guy…

A typical UC Davis JMP user in the Population Biology Graduate Group…

Descriptive statistics are done for you by your program.  If you want to know the theory, take a stats class or go to the library.  It’s boring.  Basically, a series of distributions are shown of your data.  Depending on the type of data: nominal (categorical), ordinal, or continuous, your program will spit out some figures and charts and tables.  You get to look at these and say: Thanks JMP!  But then you have to remember that you’re moving towards Step 3: Bivariate Analysis, and so maybe you should be more thorough.  For example, maybe the herd sizes in your study have a natural breaking point around 124 animals based on a histogram output shown by these descriptive statistics, where 52% of herds have less than 124 head of cattle, and 48% have more.  You might decide that you want to develop a new variable called “HerdSize_Binomial” where 0 indicates small herds (<124 animals) and 1 indicates large herds (>124 animals).  Take a look at Example 1 below….

Example 1: Distribution of Herd Size (HRD_SIZE)




100.0% maximum 1328.0
99.5% 1328.0
97.5% 1186.3
90.0% 487.2
75.0% quartile 249.8
50.0% median 121.5
25.0% quartile 84.3
10.0% 48.0
2.5% 12.9
0.5% 9.0
0.0% minimum 9.0


Mean 221.90385
Std Dev 246.72103
Std Err Mean 34.21405
upper 95% Mean 290.59142
lower 95% Mean 153.21627
N 52

Here you can see several fun things.  For instance, in the “Moments Table” we can see sample size (n=52 households) mean (u=221.9 animals), and some confidence intervals about the mean.  Above in the Quantiles table we find the rationalization for the percentage breakdown.  Our median (the number separating the higher half of the sample from the lower half) is 121.5 animals, meaning that about half of the households have more than 121.5 animals, and half have less.  We could have used this as the breakdown point as well.  It’s really up to the scientist, and that’s you remember!

Descriptive statistics are really fun to look at, especially if you’re interested in your study.  If you’re not, they will suck.  But why would we want to do this in the first place?  Well, beyond having a lot of interesting informtion that will help us understand our data and write about it, they also help us move towards Step 3…

Step 3: Bivariate Analysis. You’ve just created a HerdSize_Binomial variable based on your Descriptive Statistical breakdown of data on the number of cattle per household.  You have two scores: 0 and 1, for small and large herds respectively.  If there is any point at all to your study, you also have a dependent variable, outcome variable, or something that you are investigating.  Since I have been looking at diarrhea, and causes of diarrhea in calves, lets use diarrhea as an outcome.  0 means no diarrhea in the herd, and 1 means diarrhea.  These values were obtained by observations of animals with diarrhea in the field.  Now what we want to do, is determine if herd size is associated statistically with diarrhea, or whether the outcome variable is explained by the covariate (explanatory variable).  If you want to know the theory, please go to the library and look up the statistics.  If you’re in a hurry, just push a button on your stats program, or enable your vast working knowledge of code to run R or SAS software and “Fit Y by X” to compare outcome by covariate.

What happens, especially in the case of two nominal variables (0,1 vs. 0,1), is an output of contingency tables (2×2 tables) with subsequent Chi-Square values and their associated probabilities.  If there is a statistically significant relationship, these probabilities will be less than 0.05, or 0.01 based on what you have determined to be statistically significant.  I used 0.1 for the bivariates to look at a greater range of possible covariates to explain diarrhea for use in further more complex analytical procedures.   There will be a Fischer’s Chi-Square test as well, which with the 2×2 tables, is more rigorous than any Likelihood Ratio or Pearsons’ tests run by your program.

Example 2: Bivariate Test of Clinical Diarrhea (CLN_DIAR) by Herd Size (Total_HerdSize_Rank_Binomial):

Contingency Table (2×2 Table)

Total_HerdSize_Rank_Binomial By CLN_DIAR

Total %
Col %
Row %
0 1
1 6
2 18


N DF -LogLike RSquare (U)
52 1 6.7637548 0.1885
Test ChiSquare Prob>ChiSq
Likelihood Ratio 13.528 0.0002
Pearson 12.942 0.0003
Fisher’s Exact Test Prob Alternative Hypothesis
Left 0.0004 Prob(CLN_DIAR=1) is greater for Total_HerdSize_Rank_Binomial=1 than 2
Right 1.0000 Prob(CLN_DIAR=1) is greater for Total_HerdSize_Rank_Binomial=2 than 1
2-Tail 0.0007 Prob(CLN_DIAR=1) is different across Total_HerdSize_Rank_Binomial

In this case, we can see that Clinical Diarrhea is indeed associated with herd size as there is a statistically significant relationship shown by the P-values (Prob>ChiSq) of 0.0002 and 0.0003 for the Likelihood Ratio and Pearson tests, and verified by the Fischer’s Exact Test (p=0.0004), where these values are less than 0.05, our level of significance.  Therefore, we reject the null hypothesis: there is no relationship between the variables.  However, this significance is between Clinical Diarrhea in calves, and small herds (See the Fischer’s Exact Test: left, and associated Alternative Hypothesis), as the probablity of diarrhea in calves is greater when the herd size value is 1 (small herd value).  This requires logcially looking at your data, reviewing the distributions of herd size against other factors like location of these herds, water sources, management practices, numbers of small ruminants, and other factors that might be influencing diarrhea beyond just herd size.  But just because it is significant in a way that is contrary to our expections, it is still worth reporting, and maybe worth analyzing in combination with other factors in a more complex multivariate model, the next phase of analysis.

Next Steps. You will conduct this procedure with every single covariate/variable in your data (though not all will be contingency tables as not all data will be categorical or dichotomous).  You will then interpret the results, repeating Steps 1 and 2 of database management and descriptives to obtain more easily analyzable data (like changing number of animals to a herd size rank), and then re-analyzing, continually repeating this cycle until you have no idea what the file names are, where the outputs are stored, where you live, when you last showered, why your fiance is mad at you, and what your study objectives were in the first place.

At this point, it is generally a good idea to re-read your research proposal, remind yourself of why you conducted fieldwork and under what hypotheses, refresh the logical and theoretical assumptions and framework that influenced the data collection, and then revisit your analytical results.  Otherwise, you are sure to attend a meeting where you present these bivariates, warranting a firm and authoritative “what’s the point of this study, and why are you showing me this?” response from your advisers and team members.

Who cares?

Next time: Analysis Part 2: Multivariate Modeling and the Beginning of the End….

Ingredients for Tasty Publications

I’ve been working on data analysis for the past month, and have completed all preliminary and bivariate analyses for the study, looking into associations between survey and sampling data and Cryptosporidium, Giardia, Cryptosporidium and Giardia together, Clinical Diarrhea, and households with herds shedding large quantities of oocysts and cysts of the protozoal pathogens.  I’m now moving towards the development of a model, which will allow us to determine with greater confidence the factors impacting infection in calf herds, factors that without a model cannot be looked at in combination with other elements.  I’ve put together a short list here of some of the statistically significant risk factors for infection for Cryptosporidium to illustrate what ingredients are considered in model design.  All statistics have been performed in the JMP statistical program, created by the SAS Institute.  I’ll post more on the process of analysis in a bit…

Model Recipe 1: Cryptosporidium a la mode…

Hypotheses/Assumptions. Cypto is a fecal-borne pathogen with water-borne capability.  Water sources in the area have been identified as contaminated with Crypto.  Water is a factor.  Feces is a factor.  Calves with more exposure to feces will be at higher risk.  Therefore, larger herds, more density, and managing manure will play a role.  My data shows that smaller herds, less density, and managing manure increase infection contradicting these claims.  A new management score will thus be developed to test for a combination/score/ranking significance to further flush out these relationships.

Source: NorCal Bullfight!

Significant Ingredients:

NEORATIO (p=0.0711): Prob C+ > with increasing NEORATIO
Birth/Herd Ratio (Ratio of neonates/young livestock to adults)
NEORATIOS range from 0 to 1.  A ratio of 1 implies one calf per cow.  A ratio of 0.01 implies 0.01 calves per cow, biologically intangible unless cows have died, been sold, or calves have been purchased, traded, or stolen.  A herd averages one calving season per year, and so the range should be closer to 1.  More C+ at a ratio closer to 1 could mean more births per year for each cow, could imply larger herds with more bulls and cows, or could indicate a loss of cows, all factors that may imply death/loss, or more potential for fecal generation and thereby transmission of Crypto.

DTH_GTS (p=0.0835): Prob C+ > with increasing death
Number of goat deaths in last 12 months
Small ruminants are expected to be a reservoir for Crypto and Giardia, and more deaths could imply greater disease burden among this population.  No sampling was conducted here, and so it remains a priority for disease surveillance.  A small ruminant variable capturing births, deaths, and total herd size as a SR household score could be a way to better explore the impact of SRs on calf infection.  Note: calves and SR are often kept in the same boma, and may explain the frequent significance of SR on outcome variables.

MORT_RATIO (p=0.0325): Prob C+ > with increasing MORT_RATIO
Herd mortality ratio (number of deaths/number of animals in herd)
As MORT_RATIO approaches 1, herds begin to die off in entirety.  A low MORT_RATIO indicates less death in the past year, a potential indicator of herd health.  Higher MORT_RATIOS may indicate unhealthy herds, or those closer to predation, as disease and predation are the two major causes of death.  Both wildlife vectors, water scarcity, and nutrition may be factors at play in herds facing predation (closer to wildlife management areas (WMAs), further from villages), while disease death may indicate overall herd susceptibility to infection.

BOM_MOVE (p=0.0016; RR=7.737): Prob C+ > in HHs that MOVE
Do you move the boma?
While moving the boma was hypothesized as being protective, households that move bomas may be responding to a build-up of feces indicative of larger quantities of fecal matter, or to disease in the herd requiring the construction/relocation for herd health.  Survey responses indicate that HHs move bomas primarily when they are full of manure, as this allows predators to easily enter over the acacia thorn barriers, and when it rains, as water and manure make for excellent disease incubating conditions.  Therefore, moving the boma is a response to disease instead of a protective measure.

Giardia 3D Model (Source: Tjamrog)

GIARDIA (p=0.0231; RR=3.33): Prob C+ > if herds are G+
Simple positive infection of Giardia in calf herd
This is an expected association: herds with Crypto are expected to also be infected with Giardia, as they share similar transmission dynamics.

HighShedding_Giardia (p=0.024; RR=2.05): Prob C+ > if herds are HSG+
High or low cyst shedding calf herds based on histogram breakdown
This is also expected.  Herds with Crypto are expected to also be infected with Giardia, and herds with both Crypto and Giardia infections are expected to be higher shedding herds as they are more at risk, and most likely exposed to a variety of disease causing organisms.  These herds (C+, G+ HSC+ and HSG+) are definitely the most at-risk HHs in the study.  A closer examination of these herds is in order.

LS_H20_CLEANED (p=0.646 no Fischer; Kappa=0.016): Prob C+ > for non-SW users
Water sources (3 categories: surface, well/spring, and multiple)
Surface water was hypothesized to be a factor in disease transmission, and Crypto has been identified in several water sources in the area.  However, it is the non-surface water households that were identified with greater probability of infection.  These are a minority in the sample, and while it is possible that non-surface water sources are also contaminated, more characterization must be done.  Herds relying on well/spring water may be located closer to villages, may be more intensively managed, and therefore may be more prone to infection due to the fecal borne transmission route.  I must look into these elements in more detail.

Stocking Density (herd) (p=0.0931): Prob C+ < with increasing density
Ratio of total number of animals to primary boma area (animals/m2)
This is truly an odd finding; counter intuitive and contradictory to hypotheses.  Increasing density should increase exposure to feces and thereby transmission.  Low density however, indicates either smaller herds or larger bomas. Smaller herds can be an element of total income (another variable to look into), poverty, marginality, death, trade, or livestock exchange (no data on this).  It is possible that smaller herds and greater poverty are located in areas with poor forage and access to resources (Barabaig) and may be less healthy overall due to location (need to run spatial data in ArcGIS to test).  Larger bomas may indicate more space for boma expansion, or the existence of a once large herd that has diminished in size due to disease/predation/exchange.  Greater C+ in the lower density category therefore may be indicative of factors like poverty or livestock losses.  Less dense herds and livestock deaths should be investigated more closely, along with income and poverty variables to get at the question: why is density important?

Stocking Density (calves) (p=0.0912): Prob C+ < with increasing density
Ratio of total number of calves to calf boma area (calves/m2)
See above.  For calves, less density means less births or larger calf boma.  Less births may mean more unhealthy cows, cow losses, low milk production and higher calf mortality, or simply bigger calf bomas.  Larger calf bomas may in fact be due to larger herds of small ruminants, as the two are often housed together.

Area_Calf_Boma (p=0.0064): Prob C+ > with more area
Total area of calf boma (m2)
Larger calf bomas may in fact be due to larger herds of small ruminants, as the two are often housed together, or due to larger calf herds.  No density is in this variable, and though density trended opposite (less dense herds > risk of C+), it is possible that it was a low calving season, and the boma itself is a greater indicator of calf herd size.  Was it a low calving season due to drought other factors?  Damn lack of longitudinal data!

ZOO_DIS (p=0.0438; RR=2.724): Prob C+ > if HHs believe diseases have zoonotic potential
Do you think bad things can be passed between animals and humans?
This is a purely qualitative variable with little assistance in disease risk factors for calves.  Plus, it seems that educated households have more infection.  Is this because more educated households are closer to villages, are studied more, are provided with more information?  Are the other HHs remote (Barabaig) or closer to the park (so more education)?  This can go many ways.  No candidate for the model, but interesting to touch on for the paper and extension.

Washing Udders! (Source: David B. Frankhauser)

UD_WSH (p=0.0386; RR=0.397): Prob C+ > non-udder washing HHs
Do you wash the udder of the cow?
Indeed.  Washing the udder of the cow is in fact protective, though the low RR says not by much.  Washing the udder may remove contaminants that would directly infect the calf through ingestion, as well as the HH through flow to milk container and contact with hands.  It is nice to see that this is working out here.  Add to model, combine with management score as well.

MLK_AMT_reclass (>1L) (p=0.0292; RR= infinity?, OR=0): Prob C+ > for herds with calves consuming more than 1L milk at a feeding
Categorical binomial breakdown of calf milk consumption (>1L; <1L)
Odds ratio indicates that all herds with calves consuming more than 1L are infected.  Why?  We know that this survey question was strange for pastoralists to answer as they measure the milk for themselves but allow calves to simply suckle.  However, such a variation in responses was obtained that perhaps there is something here.  Cows with greater access to milk may come from larger herds and be more exposed to feces.  More access to milk may mean less washing of udders (look into this).  More access to milk may mean less household members with milk demands, and less labor to manage bomas, attend to animals, etc.  Weak associations perhaps, but interesting to discuss.  Add to model?  Maybe.