In this insightful lecture on cost estimation and workflow, I explored the intricacies of problem-solving activities and emphasized the crucial role of effective workflows. The absence of a one-size-fits-all set of tools for cost estimation was underscored, highlighting the need for diverse approaches in real-world scenarios. The class notes unveiled a comprehensive workflow designed for expressing knowledge and navigating uncertainties in cost estimation. Three distinct workflows, namely data science, statistics, and managerial accountants, were examined, each providing unique perspectives on the common goal.
The lecture stressed the significance of selecting dependent variables, identifying drivers, and comprehending cost behavior. Practical steps, including analyzing regression lines and verifying model accuracy, were discussed, culminating in the importance of thoughtful analysis, anomaly consideration, and real-world alignment for deriving meaningful insights.
Certainly! Below is a summary structured into 20 markdown slides with bullet points:
Feel free to adjust or ask for any specific modifications!
Certainly! Here’s the edited version for clarity:
Today’s topic is cost estimation, and I’d like to discuss the concept of workflow. Workflow refers to how various activities in problem-solving connect, such as graphing data, plotting, estimating models, and writing reports. These are individual tasks, and the workflow is about linking them to move from a question to data to an answer. Effective workflows often loop back on themselves, becoming perpetual throughout one’s career.
The second point I want to emphasize is that there’s no perfect set of tools or approaches. While you may have encountered statistics in math or accounting courses, real-world scenarios involve different estimators, processes, and workflows, each with their own benefits and costs. The goal is to match the workflow to the specific problems at hand. A good workflow generates both answers and new questions, helping express what you know, how well you know it, and what you don’t know.
The class notes for today outline a list of steps designed as a workflow to express knowledge, capture certainty, and articulate uncertainties in the context of cost estimation. I’ll start by reviewing three relevant workflows: one from data science, another from statistics, and the third from managerial accountants. Despite their differences, these workflows essentially aim to achieve the same goal, just from slightly different perspectives.
The data science workflow, known as OS EMI, involves obtaining data, cleaning it, exploring it informally, and then modeling the data. This process helps in estimating relationships. Similarly, the statistics and managerial accountants’ workflows serve the same purpose but approach it with their unique angles.
In the data science workflow, the first step is to obtain data, followed by cleaning it to ensure consistency. Exploration involves informal plotting to check if everything aligns with expectations. Questions about data variations and missing elements arise, leading to the next step of modeling the data. This involves creating a structure or aggregation, often represented by a regression line. Finally, interpretation and formal reporting occur, serving as the decision-making step in managerial accounting.
By understanding these workflows, we can see how they all contribute to the same goal of effective cost estimation, offering different perspectives on approaching and solving problems.
Now, much of the data acquisition and cleaning in managerial accounting may be performed by bookkeepers or financial accountants. Data is collected and prepared in various areas of the firm, creating a workflow that may seem linear from top to bottom but, in reality, cycles back and forth. There are instances where you return to obtaining or scrubbing data, noticing things during other steps. It’s essential to impose order on this seemingly chaotic process. While the workflow may appear disorganized, the goal is to move towards a structured workflow, progressing from step A to B to C.
Now, let’s consider a statistician’s workflow. Statisticians excel at plotting and modeling. They enter the scene after the data is ready. The first step for them is to plot the data, often starting with scatter plots, exploring different variables’ relationships. This is crucial before diving into fitting models or calculating correlations. The second step involves fitting a model to the data, extracting variation to express the fundamental relationship between variables clearly. The third step combines the model with the data to assess its accuracy, and the fourth step involves model improvement and additional data gathering to iterate and enhance the model.
How does this translate into the work of accountants? We approach things slightly differently, given our knowledge of how data is produced within the company. We choose a dependent variable, identify potential drivers (independent variables or allocation bases), and determine the underlying cost behavior for budgetary expense line items. The management accountant’s workflow starts with developing a cost accounting system. After identifying cost behavior, we follow similar steps to statisticians, plotting, analyzing data, estimating the cost function, and testing it against the data. The goal is to create a stable and reusable model for cost analysis and decision-making.
In summary, while statisticians focus on understanding the data-generating process, accountants aim to deliver a stable, reusable model for cost analysis and decision-making. Developing this model requires fundamental programming skills, emphasizing the importance of data science tools in the process. The transition from manual instructions to computer code has streamlined these tasks, but the core steps remain consistent with those of statisticians and data scientists.
Step zero involves selecting the dependent variable, which is the cost option in this case. It’s important to have at least one dependent variable for every cost center, with the possibility of having more. A cost center is akin to a small firm within the larger organization. The cost object is what we’re determining the cost of, and cost centers deliver products to the firm, treating them as their output.
To illustrate, consider the scenario of a firm acquiring another one. Initially, it’s simpler to treat them as separate entities. Over time, as synergies develop or similarities emerge, you may group them into a cost center. This process is analogous to accounting for mergers and acquisitions.
The next stage involves identifying potential drivers, the independent variables or x variables that influence cost creation. This requires detailed knowledge of the production process, often best understood by low-level employees. Consulting is highlighted as a field where decision-making often occurs at a distance from those with the most information about cost generation.
Drivers should be both plausible and measurable, considering the tension between measurement challenges and the importance of capturing significant variables. Understanding the underlying cost behavior for all budgetary items is crucial. This involves finding contracts, suppliers, and market prices for resources.
The data science portion of the process begins by gathering costs and plotting them to discern patterns. This step is vital in understanding the relationship between costs and drivers.
Moving to estimation, a cost function is derived, and questions of economic plausibility and goodness of fit are raised. Plausibility is assessed by considering factors such as the difficulty of manufacturing a product, different margins on products, and the predictability of bid outcomes.
The importance of testing at different levels of abstraction is emphasized to accurately express the cost function. Standard costs are introduced as budgeted or forecasted costs used for understanding variations from predictions.
In the final stage, the goal is to identify the smallest number of cost centers that accurately capture the cost function. This focus on simplicity is driven by considerations of maintainability, inheritability, and making tools that simplify rather than complicate tasks.
The workflow is applied to a problem set, involving steps such as plotting data, computing additional variables, and estimating cost functions through linear regression. The importance of maintaining simplicity in cost systems is reiterated to facilitate ease of use and handoff to others.
In this scenario, we are dealing with a large amount of data or parameters. However, our focus should not solely be on the precision of statistical data. Instead, we need to pay attention to simple facts that can be observed visually. In this example, I will demonstrate the basic method of calculating linear regression in Excel. The goal is to emphasize understanding the concept rather than getting lost in the technical details.
To calculate the linear regression in Excel:
Display the equation and R-squared value on the chart. While R-squared is a useful statistic, it’s important to note that it’s not the sole determinant of model fit. It provides a ratio of the variation in the function to the total variation of the data.
Analyzing the regression lines:
It’s crucial to verify the model’s accuracy by comparing it with real-world observations. For instance, in a cost forecasting scenario, ensure the relationship between labor hours and cost aligns with contractual agreements.
Examining the provided graphs, it’s evident that some models are incorrect, possibly due to data presentation errors. The negative trend in total cost, especially when it should be increasing, raises questions about the validity of the data. One approach to address this is to aggregate the data to a more meaningful level, such as quarterly.
In conclusion, a thoughtful analysis of the data, considering anomalies and verifying real-world alignment, is essential. This approach provides a more nuanced understanding of the models and allows for meaningful insights and future considerations.
so the topic for today is cost estimation estimation, and I want to talk to you about workflow. The idea of workflow is essentially how the different activities you do when you approach a problem, how they link together, right? Graphing some data, plotting data, estimating a model, writing a report. These are all individual tasks. tasks. And workflow is how you link tasks together to go from a question to data to an answer. And something that I’d like to point out to you today is that these workflows, really effective workflows, kind of loop back on themselves and are sort of perpetual as you go through your career. So, So then the second thing that I want to point out is that there’s really no perfect set of tools, set of approaches. You may have taken this, you probably took some statistics, maybe in a math course, you looked at estimating relationship between different variables. Definitely in accounting 2200, you’ve probably seen some statistics. analysis topics throughout your career and in those classes you were asked to determine with certainty whether or not something was statistically significant or or what the you know if it had the correct properties. And the point I want you to take away from this first bullet point is that’s not really how things work in the real world. right? At different estimators, processes, workflows, all have useful attributes, like benefits and costs, and we’re just trying to match our workflow to the problems. So a good workflow will generate answers and more questions, and it will help you express what you know, how well you know it. and what you do not know. This is the final point that I think is kind of the, this is what we want to generate as we approach the topic of cost estimation. If you looked at the class note for today, there’s a list of a bunch of steps. The point of those steps is to be a workflow that helps us express what we know, capture how certain we are about it, how well we know it, and then also articulate what we don’t know. And so I want to begin by reviewing three workflows that I think are relevant. The first, and the reason I’m showing you these three workflows is I’m going to show you a commonly, you know, And so I’m going to show you three workflows that I think are relevant. And so I’m going to show you three workflows that I think are relevant. And so I’m going to show you three workflows that I think are relevant. blog about this workflow all the time. Common approach to data science, a common approach to statistics, and then a managerial accountants’ workflow. And the reason that I want to show you these three things is I want to show you how they’re all doing the same thing, just from a slightly different angle. So the data science workflow, workflow is called OS EMI, which is the – and this is a link to actually a data science course that if you folks are interested in, it’s free on the internet. The sort of thing I read before going to bed at night is to relax myself. I’m sure all the personal anecdotes I’ve shared with you about me have led you to believe that I am. a very nervous and nerdy person, and that’s correct. So the first step is obtain data. This is finding out where the data we need is. This sort of presupposes the questions. We have a question. We obtain data. The second thing we do is scrub the data, clean the data. Scrubbing or cleaning is sort of getting rid of what we don’t want. and changing things into a format where we can work with. Usually this involves like making sure that the columns that have numbers just have numbers making sure that everything’s in like consistent units this sort of thing. This is oh yeah in the simple case of the assignments if I was was actually serving this for my computer I could show you the little process. I went through to take the File that I downloaded from Canvas and turn it into something that I can quickly graph in a lot of cases If you have like a nice way of interacting with the data source you can do this really So then we explore the data this is sort of informal plotting We’re just checking to see if everything works the way that we think it should. Are columns actually– are columns that we expect to be full of numbers, actually full of numbers? Are they of the right magnitude? Are there too many zeros? Are there too few zeros? How many are we missing? And then we move on to more– then we come up with questions like, where’s the– data for February? What units do we measure fuel consumption in? And kind of like, what is the variation underlying the data? So now we’re developing ideas about how to do the next step, which is model the data. And here, when we say model the data, we’re exploring the data. We’re just kind of plotting things, doing informal things. We’re not really testing hypotheses. We’re just making sure everything works, how we think it works, for the things we’re pretty sure about. And then kind of asking ourselves questions about, well, why is this this way? Or why does this data have this particular pattern? Or is this a real pattern? And it just looks like a pattern. So then we move to modeling. And then when we model the data, we’re trying to kind of create some sort of structure. structure or Aggregation maybe this is a regression in this class. It will mostly be just fitting a regression line This allows us to say well these two things look Like there’s an association between them. Let’s put a number on that. Let’s say how associated they are So this is what bottle is it’s it’s all about formalizing the capital estimating the relationships And then we interpret the term and this is where we make formal reports, do communication, and this in managerial accounting would be the decision making step.
Now, a lot of the obtaining and scrubbing in managerial accounting might be done by bookkeepers, by financial accountants. There’s a lot of other places in the firm where data is going to be gathered and prepared for you. you. So here’s the, I describe this workflow as just running straight from top to bottom, but in reality it cycles and these things kind of bounce back and forth. You go back to obtaining data or scrubbing data at certain points and you’ll notice notice things when you’re trying to do other steps. So you’ll end up kind of moving back and forth. Now, this actually, what we want to do is impose order on chaos. And this does look a little chaotic. The reason I drew the connected graph or replicated the connected graph was just to show you that you can jump from each of these steps as needed. But I do think it’s a good idea. idea to work toward a kind of organized workflow that moves in a particular, you know, from A to B to B to C and so forth. Okay, then a statistician’s workflow. Statistician’s workflow, statisticians are usually lucky. They’re the ones that are really good at just the plot. and the modeling step. So they jump in after the data is all ready. And so the idea here for the statistician is that we start by plotting the data. And this is actually how I asked you to report to approach the homework. In the homework, we were already at the explore step. step. We start by plotting the data. It’s always a good idea to begin with a scatter plot and with several scatter plots, right? Switch the axes, switch out the variable on the x -axis, see how different things are related with one another. Really often in academic settings like this where we’re not kind of interacting with each other, we’re interacting with each other. with the real world very much. We really often just jump straight to like fitting models or calculating correlations We say A is correlated with B at this percent, but we’re really not that That’s like moving way too far down the pipeline down the workflow. I want to start out by just saying like what is this What does all this data look like together? Are there patterns? Are there not patterns? Then in the second step, we fit a model to the data. And one way to think about fitting a model to the data, running a regression, is sort of sucking out a bunch of variation and trying to get to a clean way of expressing the fundamental relationship between two variables. The reason we want to start with a plotting exercise the fundamental relationship between two variables. And we want to get to a clean way of expressing And we want to get to a clean way of expressing And we want to get to a clean way of expressing the fundamental relationship between two variables. And we want to get to a clean way of expressing the fundamental relationship between two variables. we want to start with that variation. We want to see how it all looks and then come up with a trend because then we’ll be able to know what that number means. Does that number actually describe what’s going on? Then the third step is really just a combination of the two. In the third step, we put the model that we estimated back into the data. data so that we can really see to what extent the model accurately portrays the data. And then in the fourth step, we can improve the model and gather more data. These steps will help us iterate on the model and come up with a better, more useful model. Now, when I say better, more useful model, I also mean that in the model, improvement step, we may realize that our data does not allow us to ask and answer certain questions. So we’ll actually see what I think is a pretty good example of that in problem one. Okay, so how does this all translate into the model? and accountants work? We’re going to approach things slightly differently, but I want you to kind of think about the fact that what we’re doing here is just a particular type of statistics, a particular type of data science, where we happen to know a lot. about how the data gets produced. Statisticians kind of plot, think, gather more data because they’re not inside the data generation process, right? They’re trying to figure out how the data, like, what generated the data, whereas we actually work for the companies that generate data. This is going to ask people things. things. We can actually find out more concrete information about these relationships. So we choose the dependent variable, then we identify potential drivers. These are the independent variables, the allocation bases. In this, in the context of cost, when we say allocation, we mean the allocation of costs to objects, objects, not in the sense of like allocating resources, that we might talk about in other situations. Then we want to identify the underlying cost behavior for all the budgetary expense line items. And you’ll notice here that while we were talking about the data science workflow, it’s sort of general. general, applies to any data set, here we’re talking about a management accountant workflow starts from developing a cost accounting system. So you’ll notice here we’re doing some things like identifying underlying cost behavior based on expense line items. This is really just the act of going and learning what we already know about how costs are generated in the firm. Then once we have those ideas we plot and analyze the data. This is, again, now we’re doing exactly what a data scientist or a statistician would do. We plot and analyze the data. Then we estimate the cost function and then we test the cost function against the data. These are the three three core steps of the statisticians workflow and they’re a huge part of the data scientists workbook. Then we test against the data, and then we do something kind of interesting. We try to come up with the most simple functional model. And this is a place where– where, when we think about the statistician, statistician’s workflow is focused on modeling the data, learning something about the data generating process. But what we are actually going to deliver is a system of gathering costs, analyzing them, and using that as inputs to decisions. that is somewhat stable, it’s reusable over time. This fact that we’re trying to come up with a model that’s reused is one of the reasons why I really encourage, I’ve been encouraging the department as well as students to really focus on your data science tools because the way that you make it is simple. reusable model is a fundamental programming task. This is making an application that can be used by other people, whether or not it’s a literal application that people have on their tablets or their laptops, or if it is a set of practices that everyone does over and over, in the old days this was all done on paper and the class set. was a series of instructions that humans follow. And slowly, over time, we’ve been putting those instructions into computer code, and the computer does a lot of those steps. But as we develop the model, we need to do a lot of the same steps that data scientists and statisticians are supposed to have. Okay, so now let’s go through these. You might notice that my slides are a little more important than today, and this is again, for those of you that were asking for slides so that you can follow along, in case it’s easier for you to read than understand what I’m saying through my math, I try to to put as much of what I wanted to say onto the slide as possible. You’ll also notice that I don’t have questions today, and that’s because I spent so much time trying to get everything I was thinking onto the page that I didn’t end up having time to write questions to find out what you were doing. So, I’ve got a question.
Okay. So, step zero. We’re choosing the dependent variable. This is the cost option. You may need more than one. but you need at least one for every cost center. Now, what’s a cost center? A cost center, you would think of a cost center as like a little firm within the firm. This is, remember, the cost object is the thing that we’re trying to determine the cost of, and when we have cost centers or multiple cost centers within a firm, those cost centers, centers, each are delivering some product to the rest of the firm. And so we can treat that as their output. That’s the cost object for that cost center, for that department, division, et cetera. To think about this in kind of a concrete way, think about how you would do a costing exercise within a firm that– just acquired another firm. Well the easiest thing to do would be to take their managerial accounting system and your managerial accounting system and then just treat them as though they’re completely separate and just say that you know the shipping company that we bought is now our shipping department and they just sell us shipping services and distort, right? Very very simple when you think about a new firm you just acquired, right? You’re just treating them as though they’re completely separate. And then over time, you may group them into a cost center with other divisions in your company as synergies develop because things are proximate, maybe differences are in material, maybe you– reduce some management overhead and combine the management of two parts of the company after an acquisition. Maybe two parts of the company just have similar cost structures, so it’s easy to do it together. Or they perform a similar function. So over time, if you remember your accounting for mergers and acquisitions, you have to prepare for a formal form of books separately, initially, for the combined entity. And then, after a little bit of time, I think it’s like one counting cycle, then you can completely combing the books, you don’t have to keep them separate. But from a cost accounting perspective, we’re just going to keep them separate as long as it makes sense. And I think that this also, when we were talking about… the firms with interactions in their cost functions where you have synergies it’s going to be synergies that kind of drive you to combine the firm with other firms right to merge to combine products in the same firm it’s also going to be synergies that drive grouping things together into single cost centers and if you’re going to spin off a company company, right, some of you have Lenovo Thinkpads. Lenovo used to be, Thinkpad was originally an IBM product. And IBM spun it off. Well, long before IBM spun off Lenovo with their, or spun off /sold, we talked to someone from IBM, they said they spun off Lenovo. But Lenovo existed, and IBM kept sold them the Thinkpad. But before they sold the Thinkpads, they said they spun off Lenovo, and they said they spun off Lenovo. they probably started cost accounting for it separately. Now they had to do this in order to prepare for the sale, but also if it makes sense to sell something, like to sell a division of your company, it should be pretty straightforward to separate the accounting for that division, right? Because if there are synergies, then cost centers are going to be combined, and it’s firm is going to be better together than a park. But if there aren’t synergies, it should be pretty easy to split the books, and it should be pretty easy to spin them off. So one thing that you can kind of keep in mind as you think about the complexities of combining cost centers, if it’s really hard to combine two divisions into a cost center, probably don’t. And also, think about whether or not it makes sense to you. have them be in the same firm at all now There can be good reasons for that complexity, but but there’s also This is a question that you should be raised Okay, the next stage stage one is to identify the potential drivers remember drivers These are the x variables. These are independent variables. These are things that cause costs to be created. These are things that cause resources to be sacrificed. This requires detailed knowledge of the production process. Something I want to emphasize here is that the detail of the cost system are most intimately known by the low level employees and this creates a lot of decision -making problems. The people with the best information about how to cut costs are rarely the people who get asked about how to cut costs. When I did the survey at the beginning of the semester, a lot of you mentioned that you’re interested in doing consulting. Yesterday I spent four hours interviewing applicants for a scholarship about community service and for some reason they all want to be consultants. And this actually drives home, this makes this point particularly salient. One of the number one things that consultants are brought in to do is to help a company cut costs. So management, who is already quite far away from the intimate details of the production process. hires someone even further away to tell them how to cut costs. So something that I want you to keep in mind is that if you are a consultant and you come in and try to figure out how to cut costs, you are going to be very far away from the information about how costs are actually generated in the firm. Now, Now, it may not matter that much to your firm. McKinsey is famous for coming in, showing people spreadsheets, telling them to cut a bunch of costs and then leaving and letting the company deal with the fallout. But one of the most important things that you should do as a manager of an accountant is develop relationships at the lowest level of the production. process. The people who are actually there when the materials get consumed. The people who, if you are allocating direct labor, that person that does the direct labor, that person you should talk to. Because their manager’s manager’s manager isn’t going to be the one that understands why the particular materials get consumed. that we’re using are breaking the machines, right? It’s the person that’s feeding the material into the machine that will understand how costs are generated. Now it’s of course going to be your job to manage this, to think about how to optimize these things and propose solutions, but the most effective solutions will come from dialogue up and down the value chain or across the entire world. chain. I think part of the reason that I’m emphasizing this is again many of you suggested or mentioned that you’re going into consulting. When you first start consulting you know you’re gonna kind of go along on a project learn from example watch everyone else and something that you can you know they’re going to be busy because their customer is management that’s the client of your firm, it’s the people that hired you. So the senior folks are going to be focused on making management happy. Something junior folks can do on consulting projects is be eyes and ears. Talk to people, find out how costs are actually generated. Find out where bottlenecks are in ways that the senior folks who are more focused on customer service don’t necessarily have the chance. Okay, drivers should be both plausible and measurable, and there’s some tension here because if we can’t measure the most important things, we may still want to measure something. For example, for me, the university cares about your learning, and then they ask you to evaluate me now, which is completely determined by your experience in class today, not the relevant question of whether or not you’re able to use this information later. So there is a sense in which we have incentives that are shaped by what’s measurable, rather than what we actually want to do. or incentivize. OK, then we want to identify the underlying cost behavior for all the budgetary expensive items, for all the line items. So everything that we are going to– that we’re going to need to purchase at some point, we need to figure out how much that costs. Everyone that we’re going to need to hire, we need to figure out how much that costs. to figure out how much we’re going to pay that. This is identifying underlying cost behavior. There’s always going to be a contract, a supplier, a market price, right? Somebody has to, we have to get this thing from somewhere. This information is available to us. Now, this is actually very interesting. In the statistics world. in the data science world, we often receive data and we don’t know the data generating function. We don’t know what created the data. And in textbooks, we really often are all of the problems. All the problems I’m giving are based on this notion that here’s a dump of data. Run regressions and see if you can recover data. parameters, if you can recover information about the process that generated the data. But this stage two is, don’t even look at the data, go find the contracts that generated the data, go find the suppliers, the markets, do you buy oil at your company? If you buy oil, then let’s look at where do you buy oil from and what is the the historical data on your purchases of oil? Last year, when I got terrible teaching evaluations, I forced the students to build an incredibly complicated model of cost at a coffee company, which I thought was the most fun thing anyone has ever been asked to do in school. But, to my surprise, I found out that that’s not true. was not the perception of the students. So, since this was a coffee company, one of the things we did was just downloaded historical coffee bean prices from Asian markets, right? Because if we’re gonna open a coffee company here and think about how much it’s gonna cost to win the company, one of the things we need to know is how much coffee costs, but the cost of coffee bounces all around. So the question arises then, how do we parameterize this? How do we express in our cost system, the cost of coffee? What about the price of labor? These are all answerable questions. These are mysteries. We can go find the person who knows, somewhere, again, somewhere there’s a contract, there’s a supplier, there’s a market where you get the resources you need. need, and that is going to be the fundamentals of your cost system. Now you may abstract away from those details in your final cost system, but you also need to find out, you need to be data detectives, find out who is signing the checks, who is paying for this thing, where is it coming from, and that is going to help you identify the underlaw. the underlying cost behavior. So you’ll end up with like a crazy -looking cost function that’s like the price of beans plus the cost of labor for this amount of time plus the cost of lighting for this amount of time plus the water plus the milk plus the on and on and on and on and very very long complicated you know set of variables but these are are pretty simple, these are pretty simple items. We just have to get to the bottom of what they are. Now, we may abstract back up to, it costs on average $1 to make one cup of coffee at our shop, but we came to that conclusion because we tracked down every input to the cup of coffee. Thank you very much. I’m not making you do that this year, because when I made students do it last year, they hated it. And I don’t want you to hate it. So I instead asked you to do something much less complicated in this whole thing. OK, so then now we’re into the data science portion of the process. So we found out what all those things were. We found out where all the costs were coming from. from. Now we gather the costs together and just plot them. See what the data tell us. This is, I’m using casual language here, but this is really just as simple as dumping data in this spreadsheet, highlighting it, clicking a scatter plot, looking at it, and thinking, huh. Well, it looks like it’s going to be good. up into the left. Looks like it goes down into the left. Very interesting. What if I pull up– this is a function of time instead of a volume. What if I switch the axes? Well, it usually just flips. What if I change the scale? Just play around with it. See what patterns there are. When I’m asking you what patterns, this is what I mean. Um, a quick plotting example here. This is where, uh, for those of you that are interested, following along and working through the course using Python rather than Excel, I’ll put an example on this slide for you, but it’s not there right now. Okay, now first step was plotting, second step is estimation. Here, we’re just going to, in the, this is in the example, I just asked you to, to estimate a cost function that’s just going to be like the slope, right? But this raises several, several questions. One of the reasons we want to be specific about the cost function, I’m going to draw it over here. So, the reason we want to be, we’re going to, I’m telling you that, you know, we, your data. And then I want you to make a line through it because this line is going to give us some information about the magnitude. We’re going to put a number on it. Now, we want to be cautious about saying that this number is the truth. What this number is is just this relationship. Is it big or is it small? Well, let’s call it five, right? This line has a slope of, well, this line’s slope is pretty close to one, but so the magnitude of the relationship is when this moves by a bit, this moves on average by one, right? So we’re just, we’re putting a number on it so that we can take that number and think about, is that plausible? So we could evaluate the economic plausibility and the goodness of the future. and if we come up with a number that’s not plausible, then maybe we’re not done characterizing the cost function yet. This list of questions is in the handout also of course in the slides. But this will raise, when we talk about plausibility, the questions we should ask are things like, “Is it difficult?” we have a product that’s difficult to manufacture, right, you talk to the people who make it, and they’re like that’s the hardest thing to make that we make, and then you talk to, you know, then you pull together all the costs and you find that it’s cheap to make according to your cost function, you’re missing something then. So this should be a red flag. Now are there, do you have, if you have different margins on different products, are there reasons? Are there sensible reasons for the margin to be different, or is there possibly some cost that you’re just missing? Another one is do competitors sell your high margin products? If you have a high margin product and competitors are selling it too, then you should ask yourself, why is this a high -margin product for us when everybody else is also doing it, right? Because competition should drive your margins down, right? Because margins are profits, and if a bunch of other people can do what you’re doing, why are you getting paid? Why are you able to do it so much cheaper? If you ask this question, it turns it. in about Apple Nobody else is making Apple computers, right? There’s a whole bunch of reasons why people pay different prices for Apple But those are there has to be reasons why this is happening. It can’t just be some flu, right? If you’re able to sell a product everyone makes with a high margin And that’s your secret sauce and you better know what it is, right? This is your advantage You need to be a good person able to articulate what it is. If you can’t, you might be wrong about how much it costs to make that product. Outcome of bidding, the predictability of outcomes of bids. If you’re in a firm that participates in auctions and you have a good sense of… of you like know when you can win a bid that means that you understand your costs and your competitors costs and the costs of projects but if you have unpredictable outcomes to your bids like things kind of you don’t understand why you won you don’t understand why you won you’re probably paying the wrong cost and if customers like if little price increases are predictable in their impact on customers again then you probably understand what’s going on but if you don’t know what’s going to happen when you move your price by a little bit then you probably are in a situation where you don’t understand the underlying cost function. Plotting example here. Okay. The next thing we want to do is to think about is in our exercises we did cost functions at very, very high levels of aggregation, right? I think every example had two types of costs and then a total cost. Total cost would be the most aggregated. Then the two types of, in one case it’s materials. I honestly was just looking at them and I cannot remember what this is. labels of the other columns were, but we’ll look at them in just a second. But the reason you want to test at different levels is so that you can understand what level of abstraction is needed to accurately express the cost function of the country the way the cost is generated. I probably should have broken this last point out. into its own bullet point, but when you’re establishing a cost system, it’s a good idea to disaggregate as much as possible. Remember I was telling you about that really long cost function for a cup of coffee? You’re not going to use that in your everyday cost system, but you are going to use that to develop your cost system because that’s going to be your gold standard for cost allocation. You may not need it on a daily basis. basis, but you need it in order to choose the cost system or to develop the cost system you’re going to use on a daily basis. Throughout this, we refer to standard costs, and standard costs are a budgeting term that you probably saw in accounting 2200. Thank you. it did if you took it from me because I remember teaching it. But the idea of a standard cost is this is the cost– like, the cost that it– like, this is what we plan on sacrificing in order to do this activity once. So the standard– in my example, the standard cost of caliphate was about a dollar. dollar. Yeah, so standard cost is like your budgeted cost, your forecasted cost. And we’re going to use these to understand why things differed from our predictions later on. But right now we’re just developing the system that’s going to predict how much things should cost. Okay, and then in stage six, and I think this is the final stage, we’re going to identify the smallest number of cost centers. It allows us to capture our Capture our cost function accurately The reason Now I said smallest like it has to be the smallest Maybe it doesn’t have to be the smallest but But since everything can be incredibly complex, we’re looking for the least complex way to express things. And this idea comes up in a lot of different domains. Like in software development, they often talk about how if you’re programming at the edge of your competency, then you’re not going to be able to do that. going to be able to maintain what you build, right. So you need to simplify a little bit. Even though you could do something more sophisticated, you need to be a little bit, you need to write code that’s a little bit dumber than you are. And that’s the idea here. We need to make a system that’s a little bit dumber than we are. We could understand a system that was a little more complex, but if, and we could build it, but we might not be able to maintain it. So here’s the idea. about the smallest number of costs that is possible is really this idea of maintainability and of inheritability, right? Because if you are the one that’s going through all these steps, you are developing a cost system, which is a product that other people are going to use. It’s OK if you get bored of me relating to the expected software. but this is a sort of software. If we did this all on paper, the program would be the instructions that we handed off to each other. We would be the humans, right? The word computer is a, like, started out as a person who computed things. They did the verb computing. They were a compute verb. Now you have machines that do it. So this may, many parts of this may end up being. software. But you are coming up with a system that you are going to give to someone else. Someone else is going to have to use it. So again, you need to make it a little dumber than you are, and a little dumber than they are, so that you can hand it off in a reliable fashion so that people can continue to use it. In my research projects, I run into this problem all the time, that I’m the person on the project that understands– understands the data side of things. And so I will often build a system for kind of prototyping models that I think is relatively simple and very straightforward and easy to use and very efficient. And I’ll hand it off to a co -author and they’ll come back and be like, “I have no idea what this system is. Where they’ll stop using my system.” Because my system was too hard for for them. So again, the reason that we are focusing on the smallest number of cost centers is because we want to make something that we can maintain and that we can hand off to other people and that they can continue to use productive. Also, you don’t, you want to make sure that like, should make people’s lives easier than not having the tools. If it’s easier to hammer a nail without a hammer, then the hammer is not a hammer. The hammer is only a hammer because it helps you hammer in nails. If you have some item that is worse at that than not having it, then you don’t have a tool. So that’s the idea here. You need something. Again, the reason we’re simplifying. simplifying is not because simplicity is some sort of abstract virtue, but because simplicity means that we can give someone something that will be easy for them to use and they’ll be able to use it. And we may sacrifice some accuracy at the margin in order to have something that we can actually hand off to other people. Okay, so that’s kind of a review of the workflow of being a management accountant by developing costing systems, and now we are going to work through the problem set and just pull out the–well, we’re basically going to do the workflow that we just talked about. Okay. Okay. Problem one is Trevor Kennedy, cost analyst at United Packaging. They make cans. He wants to develop a cost function that relates engineering support to machine additives. That should say PS, not the five. We don’t reference anything. So it’s a reference. There’s labor and materials. Labor’s paid monthly. And then these materials and parts we purchase from an outside vendor every three months. And then we prepare this monthly accounting paper. So basically, if you remember the data scientists workflow, we’ve skipped through all the collecting and cleaning. And boom, here we are. Now– Now, what do we do next? Well, we’re going to start by creating three plots. We’re going to do this for two reasons. One, because it’s the next step in our workflow. And two, because that is the assignment. And again, my whole purpose is to get you, like, the reason I’m asking you to do this. assignments is so that you start to develop your own workflows for encountering complex ambiguous problems. And here’s an example of this kind of in action, because the questions I’m actually asking you here are steps in the workflow. Okay, so we’re going to plot three things as a function of machine hours, labor materials in total, and total is just going to be the sum of the– Okay, this is a reminder to me to– Okay, so first stage we have just a blank a blank excel sheet and we can just we’re just gonna highlight, Troll C. Sorry. I use Mac and Linux, Troll C. see, okay, I can’t even copy and paste properly. Okay, so the first step is just going to be to paste in the raw data. Something to keep in mind is that dollar signs are not numbers in most statistical programs. So you may get errors or just have dropped up. Like, you may have problems if something like this occurs. Like, we could lose March just because there’s a dollar sign in front of it. So one of the first things you might want to do in cleaning the data is just confirm that there’s no dollar signs in your dollar columns. In this case, there’s no dollar sign. was relatively smart, and you can see when you click on the cell, you see what the actual value in the cell was, and it’s 347, and then the cell is just – you notice this is a general cell, and that’s a currency cell. So, Excel was relatively smart, and you can see when you click on the cell, you see what the actual value is. understood what it was, parsed it out. We need to make sure that that happens, otherwise we may end up with errors. OK, so there’s our raw data. Maybe we’ll do some other cleaning steps. I usually include in the cleaning and scrubbing calculation of additional variables. In this case, the only additional variable– [INAUDIBLE] let me turn on the system. I think it will be easy to see. So we’re going to calculate an additional total cost variable so that we can easily plot and use that. Next step is to actually do the first question. and we’re going to simply plot. This is going to be like what we did last time, but it’s going to be a little bit easier. And we will – oh, on the – well, maybe this won’t be a problem here. Let’s give it a shot and then see how it goes. So we click over to Insert. we find Charts, Scatterplot. It’s usually easiest if you highlight what you want to plot first. And then click. OK, now notice we have the object, OK. is cost on the bottom on the x -axis and we have the driver on the y -axis. This is backwards. If we click select data then we can choose what goes where. Another option is to simply… simply, and I find this is usually easiest in Excel to kind of rather than using Excel’s options to switch things, just change what you feed to Excel. So I’m just going to take the cost driver and put it on left and then Excel will be a little more, will understand what I’m saying. I’m trying to do. Okay, so rather than trying to quickly learn how to use a PC in front of all of you, I’m just going to switch to the next one. page where I did this already. I just moved machine hours to the front and then selected columns B and C, clicked insert, found the scatter plot, clicked in. So now we have it. labor hours, or sorry, the machine hours and labor costs, that’s right. Yeah, machine hours and labor costs. So this is our first step. We’re basically done with that portion of the question. And then we can just… a quick way to do this is just to like copy and paste this table over again, and then Double -click it select data and just move over one column to select materials But we can do the same thing to get the materials column now if we look at Size just to take tad, okay, so the labor graph, this looks great, this looks like machine hours and labor costs really move together in a very nice way, but materials, this looks weird, right, there’s something to hear that’s not quite right. right. So if we were doing the workflow properly, we would kind of make a bunch of choices here so that what I’m going to show you doesn’t happen. But I’m just going to leave this for now. Remember, if we go back to the question in the question, we’re going to have the materials we purchased the materials quarterly, or like, yeah, quarterly. So three times in this data set, and then we just aren’t purchasing here. So this isn’t, we recorded these as zero in the accounting system, but it might actually be better to record them as just nothing, right, because we actually don’t, there’s no, we shouldn’t be measuring at these points because of that. happening. So let’s– and then if we look at total cost, you notice that total cost is actually kind of contaminated because it just so happens that we purchased materials in the low machine hour months. This is kind of good. an interesting situation to be in. But these dots here, the difference between them has more to do with timing than anything else. Let’s pop back, look at the question. Okay, so this is the whole… first question, just to create the plots. For example, then we want to compute estimates of the three cost functions in requirement one, using linear regression. Now there’s a bunch of ways you can calculate linear regression. Excel has, you can click under data. Yeah, you can click under data. through and select regression and it’ll give you an estimate. We can do that here. So if we click through, then it doesn’t seem to be in on this podium. But we could– there is a way to calculate the regression. And then we get all of those statistics that statistical programs like to shoot out.
So there’s just big boxes of data everywhere, or big parameters everywhere. But that’s actually not necessarily what we care about, especially not in this situation. situation. Because all of that precision, that statistical precision, is going to kind of get in the way of the really simple facts that we need to notice with our eyes in this problem. And so I’m going to, in this example, I’m just going to use the most simple way in Excel to calculate the linear regression, because I want you to focus less on, like, this detail. technique of calculating regression and more on what we’re going to do and learn from the regression. Okay, so the easy way, well I said easy way and then it might not be so. Okay, so we’re going to select all data, and now we want to get the regression lines. And the way that we get these regression lines is we need to select the – we need to find the trend line option in the menu. for the plot. We come down to series. So we just come over to the trend line, select trend line options and select linear and that will give us a linear function and then I select it and just give us a straight line. This is the regression like the ordinary least squares estimate. And it is just, again, this is a review from 2200 and hopefully some other courses you’ve had. The regression line, the simple ordinarily squares line is just the line that is simultaneously closest to all of the points of data that you’re plotting through. And then then, yeah, so then I clicked over here, selected display equation on the chart, and I clicked display R squared value on the chart. Now, if I ask you how well the data fit this line, many of you will respond by telling me what the R squared is. Now, that’s correct, but the R squared is a statistic about the data, right? It’s like the, the, ratio of the variation in the data that is like the actual variation in the data. Sorry, I’m saying this backwards. It’s the ratio of the variation in function over the range where you have observations to the total variation of the data. And that is related to how well your line fits. But it’s not, like, perfect. And there’s some ways to add parameters that don’t do anything that sort of increase mechanically your R squared. So I don’t think that the R squared is necessarily the be all and end all of model fit. But it is a useful statistic in this case. You can see that this is a high R squared. And when we model similar data over here, here we end up with 60 % which is still in a lot of cases a very high R squared but you can see that this line is fitting in so we’ll talk about that down in a second. The thing that I want you to keep in mind here is if I look at this and I am going to make a recommendation to someone that if we are making a choice that requires 30 labor hours and we are moving to 40 hours. labor hours. I would feel pretty comfortable saying that the point on this line is a good forecast for how much we’re actually going to consume, right, for the actual cost of labor. And I could look at the contracts that we have with employees and I could verify verify that this relationship between labor and cost is actually reflected in the contracts of the people we have to hire, right? Like we can, this is concrete, we can verify this. Now in this graph or in this graph, I did the same thing, just found the trend line options, which it always takes me forever to find the trend line options. Like I usually go into the hella dialog and ask where is the trend line options, and then it tells me and I click it. So I select the trend line, turned on the equation in the chart. I did not select the intercept to be zero, because that would put the line, like, here. It would require that there was zero cost to add zero, and we actually want to estimate the intercept. Okay, so this line is negative, and the cost of making nothing is $12 ,000 or $1 ,200. Thank you. the cost of making 60 is zero. So this is clearly wrong, right? This doesn’t make any sense. So first of all, cost should be going up. Total cost should be going up. Now, maybe there’s something where, like, like, once we hit some level of– like, you may see something where we have total costs kind of going up, but then maybe, like, we sell off some real estate somewhere, we move the company somewhere else. You could conceivably find a little drop in total cost above some amount if you, like, really restructure the firm. It’s really rare to see something like this, where the total cost is decreasing. Total cost, not average cost, not marginal cost. This is an efficiency, this is something weird. The weird thing is that these values are not zero. They should be missing, right? This is just a mistake. This is just wrong. This data shouldn’t be presented this way. And it gives us two incorrect models. Now this is why I want you to… to plot the data and look at it and then put your equation on the map or on the plot and look at it and think about does this make sense because if you ship it like this, you’re going to look like, well, you’ll make a mistake. And we have a similar situation down here. here for– let’s come back to the chart area equation. Okay. It’s not quite as negative, but it’s still negative. So what can we do in this situation? What we can do in this situation is we can take a number of approaches. One thing we can do is just aggregate the data to a more meaningful– level of analysis, right? So we could just do this at the quarterly level. Now we don’t have very much data at this point, but if we go to the quarterly level, but you can see that when we go to the quarterly level, we’re back to the original relationship between labor. and we have a much more meaningful data set. So in this case, one solution would be to aggregate. We have two minutes left. I purposefully left out all the Python stuff so I wouldn’t run out of time. We did run out of time, but I have the problem three data, which we’ll clean in the exact same way, just calculate total cost and move the miles to the front of the graph. And then the P3 plot is going to illustrate quite nicely. This looks like pretty good data. And what I would say is based on operating costs, we know the date of this observation. We should ask. We should ask the people who made the products. Why did that one happen? Why is that number so high? Here, maintenance costs. Maintenance costs going down is puzzling, but again this, this asks us, this raises the question of why, why are these, why is maintenance costs going down? It may be. be that what we’re actually picking up is that maintenance costs over time are doing some, have one trend. Like the market price of maintenance might be declining while the amount we’re producing, right? If our company is growing, the maintenance costs are falling, then as our costs, our utilization, utilization of our mileage goes up, the market price gets cheaper and so it’ll look like our maintenance costs are going down. But this should not be the case in reality. We’re consuming more of maintenance, but the price is changing, potentially. So this raises more questions than it answers. is kind of the final point. On an exam, if I ask you to – I’m going to ask you basically exactly what I asked you for the homework. But here I want you to just discuss everything that’s interesting about the graphs, right? Tell me what’s clear, tell me what’s unclear. If you say the sort of thing that would be like unambiguously wrong, in these questions would be like we know absolutely everything, this model is perfect, nothing outside the model exists. That’s wrong. But if you said I think that, you know, I have some theory about some things that could be happening. These are the things I want to check next. That’s a great answer. Right? Open ended. Some ideas about what to do next. But you understand what the graph says. You understand why that’s worth asking more questions about. That’s where I want you to be. That’s the sort of answer that I think of as a correct answer to this sort of question. That is one minute over time, so thanks a ton. Email me if you have questions, and we’ll see you on Thursday. And Thursday, it’s going to be a confusing question. It’s a tricky question. That’s fine. Think about how do you solve this constrained maximization problem. Good luck.