matlab_-_datatypes

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

matlab_-_datatypes [2012/10/06 04:43] jochen |
matlab_-_datatypes [2012/10/06 17:58] jochen Indexing |
||
---|---|---|---|

Line 5: | Line 5: | ||

A variable can be thought of as a storage container that has the following properties: | A variable can be thought of as a storage container that has the following properties: | ||

* links a name (identifier) to a value (or list of values) | * links a name (identifier) to a value (or list of values) | ||

- | * is available in a workspace, i.e. when a function is called, variables available in the calling workspace are hidden, unless they are **''global''** variables (different function code files can use the same variable names without conflict) | + | * is available in a workspace, i.e. when a function is called, variables available in the calling workspace are hidden (different function code files can use the same variable names without conflict), unless they are **''global''** variables |

* can be assigned a (new) value (or list of values) using the **''=''** sign (assignment operator) | * can be assigned a (new) value (or list of values) using the **''=''** sign (assignment operator) | ||

* has a specific datatype, which can change during the course of a program (or command line session) | * has a specific datatype, which can change during the course of a program (or command line session) | ||

Line 82: | Line 82: | ||

A special case is the **''logical''** datatype. It can only store two values: **''false''** or ''**true**''. If converted to any of the other numeric datatypes, **''false''** is converted to **''0''** and **''true''** is converted to **''1''**. | A special case is the **''logical''** datatype. It can only store two values: **''false''** or ''**true**''. If converted to any of the other numeric datatypes, **''false''** is converted to **''0''** and **''true''** is converted to **''1''**. | ||

- | ===== Character datatype ===== | + | Please note that instead of being keywords (such as in C/C++), datatypes are not "reserved" words, but obviously the names of datatypes should not be used as variable identifiers. The reason is that those names are also the names of the functions used to convert the numeric datatypes into one another! For instance the code snippet <code matlab>dvar = [7, -10, 13]; ivar = int32(dvar);</code> converts the **''double''** variable dvar (default numeric datatype!) into a variable of type **''int32''**. If the target datatype cannot hold the value(s), for instance because the value range is too small, the value will be truncated (in precision and range) to fit the new datatype. |

+ | | ||

+ | ==== Character datatype ==== | ||

Given that Matlab variables can be arrays of arbitrary size, single characters, as well as "strings" (a series of characters, such as in a word or sentence) and also lists of strings (two-dimensional field of characters) all are stored with the same basic datatype: **''char''**. | Given that Matlab variables can be arrays of arbitrary size, single characters, as well as "strings" (a series of characters, such as in a word or sentence) and also lists of strings (two-dimensional field of characters) all are stored with the same basic datatype: **''char''**. | ||

Line 98: | Line 100: | ||

% is their distance in the alphabet, in this case 3!</code> | % is their distance in the alphabet, in this case 3!</code> | ||

- | ===== Cell compound data ===== | + | And that also means that a variable can be converted from **''char''** to **''double''** (or any other compatible numeric datatype) and back. This can be useful to store large quantities of text and also to perform arithmetic operations on characters (for instance to test if all elements of a string are letters). |

- | In many situations it is necessary to store data of different types (e.g. a name/string together with a number, such as age) in a "dataset", which still should be accessible via a single variable. For this purpose, Matlab provides the **''cell''** datatype. | + | |

- | To define a cell array as well as to address the **content** of a cell, Matlab uses the "curly braces" characters: **''<nowiki>{</nowiki>''** and **''<nowiki>}</nowiki>''**:<code matlab>% define a 1x2 cell array with a name and an age | + | ==== Compound datatypes ==== |

+ | In many situations it is necessary to store data of different types (e.g. a name/string together with a number, such as age) in a "dataset", which still should be accessible via a single variable. For this purpose, Matlab provides built-in compound datatypes. | ||

+ | | ||

+ | These compound datatypes are further sub-divided into two types: one where elements are (mainly) addressed by a numeric (or equivalent) indices, and one where elements are accessed by a field name in a tree-like structure (same rules as identifiers, but also keywords can be used as field names, given the syntax). | ||

+ | | ||

+ | === Cell datatype === | ||

+ | The **''cell''** datatype allows to access individual elements (as well as groups of elements), called "cells", with numeric indexing. This is mostly useful for storing tabular data (or data where several "columns" should be accessible at once), and numeric only data is too limited (e.g. in cases where text elements cannot be reasonably matched to numeric values a priori). | ||

+ | | ||

+ | To define a cell array as well as to address the **content** of a cell, Matlab uses the "curly braces" symbols: **''<nowiki>{</nowiki>''** and **''<nowiki>}</nowiki>''**:<code matlab>% define a 1x2 cell array with a name and an age | ||

name_and_age = {'John Doe', 41}; | name_and_age = {'John Doe', 41}; | ||

Line 113: | Line 122: | ||

Put differently, if you imagine a shelf with 5 jars on it. The entire shelf then represents a cell array (by the name of **''shelf''**). The expression **''shelf(3)''** will then return the third jar of the shelf. On the other hand, the expression **''shelf{3}''** returns the **content** of the third jar! | Put differently, if you imagine a shelf with 5 jars on it. The entire shelf then represents a cell array (by the name of **''shelf''**). The expression **''shelf(3)''** will then return the third jar of the shelf. On the other hand, the expression **''shelf{3}''** returns the **content** of the third jar! | ||

+ | |||

+ | Also, please note that if you use a variable (numeric, char, or compound!) when creating a new compound variable or setting one or several elements of a compound variable, the values in the compound variable will be **copies** of the content of the original variable (or rather, if the variable is altered, a copy is created). For instance:<code matlab>% assign a value to variables n and s | ||

+ | n = [12, 14, 10]; | ||

+ | s = 'size'; | ||

+ | |||

+ | % store variables in compound variable | ||

+ | c = {s, n}; | ||

+ | |||

+ | % re-assign index 2 of n variable | ||

+ | n(2) = 18; | ||

+ | |||

+ | % and then c still contains [12, 14, 10] in its second cell element!</code> | ||

+ | |||

+ | === Struct datatype === | ||

+ | The other compound datatype in Matlab is the **''struct''** type. This allows to store multiple values of different types accessible via names. | ||

+ | |||

+ | Syntax-wise, the variable name (of the compound variable) is followed by a period (dot character, **''.''**) and a field name. If the field name itself is stored in a (char) variable, the syntax is **''struct_variable.(field_name)''**. | ||

+ | |||

+ | The obvious advantage is that code usually becomes less cryptic, given that instead of using syntax such as **''compound_var{3} = some_function(some_value);''** you would write **''compound_var.property = some_function(some_value);''** whereas ''property'' can be a more meaningful term (such as ''name'' or ''duration'') instead of using a numeric index (as with the cell syntax). | ||

+ | |||

+ | Please note that variables of type struct still can be of arbitrary size! This means that a list of structures (e.g. 5 people's names and their ages) could be stored in one variable. The syntax to access the 3rd person's name and age would then be:<code matlab>% read out one persons name and age | ||

+ | this_name = people(3).name; | ||

+ | this_age = people(3).age;</code> | ||

+ | |||

+ | In turn this means that if the index expression is omitted, the **''struct_var.fieldname''** syntax produces a list of values (possibly with different datatypes). The only useful way to capture such a list is by creating an ad-hoc cell array:<code matlab>% get the names of all people | ||

+ | all_names = {people.name};</code> | ||

+ | |||

+ | === Multi-layered compounding === | ||

+ | Each cell and struct field can contain any of Matlab's supported datatypes, including cell and struct arrays! This means that elaborate structures (or cells) can be created:<code matlab>% assign a part of a subfield's value | ||

+ | main_tree.leaf(3).subfield{4}(11:20) = 1;</code> | ||

+ | |||

+ | If you see such a piece of code this can be translated into | ||

+ | * main_tree is of datatype struct (and must be of size 1x1, i.e. a scalar struct, to be valid) | ||

+ | * one of the field names of main_tree is called leaf, is also of datatype struct with at least field name subfield, and it presumably has at least 3 elements (see comment below) | ||

+ | * the subfield (of the third leaf!) is of type cell with at least 4 elements | ||

+ | * the 4th element of the subfield is a numeric array | ||

+ | * numeric indices 11 through 20 (of this numeric array) are set to 1 | ||

+ | |||

+ | Please be aware that this line of code is also valid if main_tree is not yet defined! In that case, the **''leaf''** field will be initialized as a 1x3 struct with field name subfield, but the first two "leaf" elements will have an empty subfield. Only the third leaf's subfield will be assigned a value, which will be a 1x4 cell array, of which only the 4th element will have a value. And that will be a 1x20 double array, of which the first 10 values are 0! | ||

+ | |||

+ | As you can see, while Matlab's "auto-declaration" of variables can be quite useful, it sometimes makes code difficult to understand (given that no explicit declaration is ever given that pre-determines the shape or type of content for compound variables). | ||

+ | |||

+ | ==== Function handles ==== | ||

+ | The **''function_handle''** type is an advanced datatype which is used mainly in three cases: | ||

+ | * the type of operation that is to be applied to a variable (or expression) is not fully determinable before code is run, in which case a function handle can be used to call a variable function (instead of using a syntax construction that selects from all possible options, which still could be insufficient if a function can be user defined) | ||

+ | * an argument in a function call itself is "an operation" to be applied to certain values | ||

+ | * a function is only available in a certain context (private function or sub-function in an M-file), in which case a function handle can be created and returned, allowing functions outside the original scope to use this function after all | ||

+ | |||

+ | Given the fact that this feature is fairly advanced, I won't be giving any in-depth examples at this point. Just be aware that variables can also be of type **''function_handle''**. | ||

+ | |||

+ | ==== User-defined datatypes ==== | ||

+ | Matlab allows users to add functionality by creating text files with the **''.m''** extension. While this allows to create new functions that can be applied to values, most modern languages also have object-oriented design patterns. Among those are method overloading, inheritance, and protected storage of properties. | ||

+ | |||

+ | For this purpose, Matlab allows to define new datatypes (classes) by adding folders with a leading **''@''** (at) symbol, and placing an M-file into this folder with the same name (without the **''@''** sign). | ||

+ | |||

+ | Again, this is fairly advanced coding and will not be discussed in-depth at this point. There are, however, a few important aspects I want to mention: | ||

+ | * the internal representation of objects is supposed to be of type struct | ||

+ | * indexing operations (incl. the struct syntax of **''variable.fieldname''**) can be overloaded on objects | ||

+ | * NeuroElf makes use of this feature by allowing a more C++/Visual Basic style syntax:<code matlab>% NeuroElf xff and xfigure examples | ||

+ | % load a VMR into an xff object | ||

+ | vmr = xff('some.vmr'); | ||

+ | |||

+ | % call the function in @xff/private/aft_Browse.m | ||

+ | vmr.Browse; | ||

+ | |||

+ | % get xfigure object of the main UI figure | ||

+ | neuroelf_gui; | ||

+ | global ne_gcfg; | ||

+ | mainfig = ne_gcfg.h.MainFig; | ||

+ | |||

+ | % switch to "page" 2 (this is overloaded differently in the @xfigure/subsref.m file) | ||

+ | mainfig.ShowPage(2);</code> | ||

+ | |||

+ | ===== Indexing ===== | ||

+ | It is both a blessing and a curse that Matlab uses the same language elements for passing arguments into a function and indexing into a non-scalar array:<code matlab>% creating a 3x3 variable with random numbers | ||

+ | randvals = randn(3, 3); | ||

+ | |||

+ | % accessing the value at the 3rd row and 2nd column | ||

+ | randvals(3, 2) | ||

+ | |||

+ | % computing the sum along the 2nd dimension | ||

+ | sum(randvals, 2)</code> | ||

+ | |||

+ | In both cases, common parentheses, **''(''** and **'')''**, are used to | ||

+ | * sub-select values (array elements) from a non-scalar variable | ||

+ | * pass arguments (in this case the variable ''randvals'' and the scalar value ''2'') into a function | ||

+ | |||

+ | One of the reasons is that even the syntax **''randvals(3, 2)''** can be seen as a function call (and, for user defined objects, actually leads to a function call!). | ||

+ | |||

+ | The blessing is that, for user defined objects, this can be used to create very elegant code. The curse, on the other hand, is that it cannot be determined if an expression is an indexing operation of a function call, in cases such as<code matlab>var_or_function(x, y);</code> | ||

+ | |||

+ | ==== Subscript indexing ==== | ||

+ | Given that, in principle, all of Matlab's variables support non-scalar (single value/element) content, it is necessary to allow to access individual elements. In the example above, this is exactly what happens with **''randvals(3, 2)''**, which selects the value in the 3rd row and 2nd column of the array. | ||

+ | |||

+ | But subscript indexing not only allows to select a single element, but also ranges of elements. For this purpose, each indexing expression can be a list of indices. When a variable is accessed to "read" from it, each of these lists must contain only valid entries (integer numbers, with a minimum of 1 and a maximum according to the size of the array in that dimension). Other than that, there are no restrictions (for reading from an array). When writing to an array (subscript assignment), the list must be unique, but values can be greater than the existing size, in which case the variable is expanded accordingly: | ||

+ | |||

+ | <code matlab>% creating a 5x4, i.e. 5 rows and 4 columns, variable with random values | ||

+ | fivebyfour = randn(5, 4); | ||

+ | |||

+ | % reading the value at row 3, column 1 | ||

+ | fivebyfour(3, 1) | ||

+ | |||

+ | % reading the entire 2nd row | ||

+ | secondrow = fivebyfour(2, :) | ||

+ | |||

+ | % reading the 4th column | ||

+ | fourthcol = fivebyfour(:, 4) | ||

+ | |||

+ | % reading the 2nd to 4th row, 1st to 3rd column | ||

+ | smaller3x3 = fivebyfour(2:4, 1:3) | ||

+ | |||

+ | % reading all uneven rows and columns | ||

+ | unevens = fivebyfour(1:2:end, 1:2:end) | ||

+ | |||

+ | % reading (in this order) 4th, 1st, and 3rd rows (complete) | ||

+ | r413 = fivebyfour([4, 1, 3], :) | ||

+ | |||

+ | % repeatedly reading the 2nd column | ||

+ | col2times3 = fivebyfour(:, [2, 2, 2])</code> | ||

+ | |||

+ | All but the last of these expressions are also valid for assignment, in which case the value or array being assigned must either be a scalar (which is then stored in all of the written-to elements) or match in size (i.e. to write into a 3x3 sub-part, only a 1x1 or 3x3 right-hand-side value/array can be used). | ||

+ | |||

+ | In addition to these expressions, write access also allows to use indices exceeding the size: | ||

+ | |||

+ | <code matlab>% increase the size to 6-by-6 | ||

+ | fivebyfour(4:6, 4:6) = 1;</code> | ||

+ | |||

+ | Importantly, this performs two steps: | ||

+ | * first, the array is expanded, with all new elements being assigned the "neutral" element: | ||

+ | * for numeric variables this is 0 | ||

+ | * for logical variables this is **''false''** | ||

+ | * for char variables this is also 0 (which is NOT the blank character)! so this syntax should not be used to extend a string! | ||

+ | * for cell arrays this is an empty double array (empty cell content) | ||

+ | * for struct arrays all fields are empty double arrays | ||

+ | * next, the portion that is specified with the indexing expression is assigned the provided value(s) | ||

+ | |||

+ | This means that in the above example, the first three values of the 5th and 6th columns will be 0! | ||

+ | |||

+ | And finally, this type of indexing also allows to **shrink** an array by a special syntax:<code matlab>% remove 2nd row | ||

+ | fivebyfour(2, :) = []; | ||

+ | |||

+ | % remove 3rd and 4th column | ||

+ | fivebyfour(:, 3:4) = [];</code> | ||

+ | |||

+ | Please note: for higher dimensional variables (3D/4D), all but one indexing expression **must** be **'':''** (the colon character), because the region to be eliminated must be "removable" without interfering with array storage rules. | ||

+ | |||

+ | Overall, the idea behind using subscripts is the most "complete" form of indexing: using as many expressions (arguments) as dimensions. If for instance a variable stores numeric values in 3 dimensions (such as a anatomical 3D image of a subject's head) in a, say, 256-by-256-by-256 array, you can then easily access a slice in any of the three dimensions: | ||

+ | |||

+ | <code matlab>% assuming that vol3d is a 3D array, getting three slices in the middle | ||

+ | xyslice = vol3d(:, :, 128); | ||

+ | xzslice = vol3d(:, 128, :); | ||

+ | yzslice = vol3d(128, :, :);</code> | ||

+ | |||

+ | ==== Single expression indexing ==== | ||

+ | Matlab's internal storage works like this: in a multi-dimensional array, values are stored in order of column indices, then row indices, then 3rd, then 4th dimension, and so forth. That means that in a 3x3 variable, the order of values (in memory) is as follows: | ||

+ | |||

+ | <code matlab>% order of values in a 3x3 variable | ||

+ | ttvar(1, 1) | ||

+ | ttvar(2, 1) | ||

+ | ttvar(3, 1) | ||

+ | ttvar(1, 2) | ||

+ | ttvar(2, 2) | ||

+ | ttvar(3, 2) | ||

+ | ttvar(1, 3) | ||

+ | ttvar(2, 3) | ||

+ | ttvar(3, 3)</code> | ||

+ | |||

+ | The total number of values is simply the product of all dimension lengths (size). There are contexts in which, for instance, an operation has to be performed to each individual element of an array, regardless of position. Matlab thus allows to access all elements with a single index expression: | ||

+ | |||

+ | <code matlab>% accessing element ttvar(2, 2) via single index | ||

+ | ttvar(5)</code> | ||

+ | |||

+ | Please note that this syntax can be extremely misleading, particularly for people unfamiliar with Matlab index expressions! There is, however, one extremely useful application for this, in which **all** values of an array are considered as a single column: | ||

+ | |||

+ | <code matlab>% computing the total average over a 3D volume | ||

+ | avgslice = mean(vol3d, 3); | ||

+ | avgcolumn = mean(avgslice, 2); | ||

+ | totalavg = mean(avgcolumn); | ||

+ | |||

+ | % alternatively, by "columnizing" the array, compute in one step | ||

+ | totalavg = mean(vol3d(:));</code> | ||

+ | |||

+ | Please note that for multi-dimensional arrays, this syntax leads to an error if the index exceeds the total number of elements, i.e. an array cannot be resized with a single index expression! | ||

+ | |||

+ | ==== Variables as indices ==== | ||

+ | Matlab also allows to use (numeric) variables (and return values of functions) to be used in index expressions:<code matlab>% create a 4x4 array with random numbers | ||

+ | four_by_four = randn(4, 4); | ||

+ | |||

+ | % select random row | ||

+ | rrowindex = ceil(4 * rand(1, 1)); | ||

+ | rrowdata = four_by_four(rrowindex, :) | ||

+ | |||

+ | % select random column without variable | ||

+ | rcoldata = four_by_four(:, ceil(4 * rand(1, 1))) | ||

+ | |||

+ | % and select a random value from an array without "knowing" its size | ||

+ | randomarrayvalue = four_by_four(ceil(numel(four_by_four) * rand(1, 1)))</code> | ||

+ | |||

+ | In this context, the **''size''** and **''numel''** functions of Matlab are important! **''size(variable)''** returns a 1-by-number-of-dimensions list of array sizes of the passed in variable (or other argument), and **''numel''** returns the total number of elements. | ||

+ | |||

+ | This can be used, for instance, when an operation has to be applied to a data, say, slice by slice:<code matlab>% inquire about the size/dimensions of a volume | ||

+ | volsize = size(vol3d); | ||

+ | |||

+ | % "loop" (iterate) over all slices (in 3rd dimension) | ||

+ | for slice = 1:volsize(3) | ||

+ | |||

+ | % compute critical value | ||

+ | critval = critval_function(vol3d(:, :, slice)); | ||

+ | | ||

+ | % break (leave loop) if threshold is hit | ||

+ | if critval >= 10 | ||

+ | break; | ||

+ | end | ||

+ | end</code> | ||

+ | |||

+ | ==== Logical indexing ==== | ||

+ | On top of using numbers (or numeric expressions, variables, etc.) in indexing expressions, Matlab also supports selecting indices using a logical expression. The most typical application is by applying a comparison operator to establish a threshold: | ||

+ | |||

+ | <code matlab>% generate an array with 10 random numbers | ||

+ | r = randn(10, 1); | ||

+ | |||

+ | % sum all numbers that are greater (or equal) 0 | ||

+ | pos_sum = sum(r(r >= 0))</code> | ||

+ | |||

+ | The indexing expression **''(r >= 0)''** performs an element-by-element comparison with 0. This comparison creates (temporarily) an array of logical (true/false) values with the same size as ''r'', which is then used to "select" all values from r for which the comparison is true. | ||

+ | |||

+ | If the comparison operator is used on and then returns a multi-dimensional logical array, the indexing operation automatically converts the result into a column vector (given that arbitrary elements are selected, not allowing the result to be regularly shaped). | ||

+ | |||

+ | Logical indexing can also be used with the subscript notation (more than one expression), in which case each logical expression ought to have as many elements as the size of the variable in that dimension:<code matlab>% create a 10-by-10 array with random numbers | ||

+ | r10x10 = randn(10, 10); | ||

+ | |||

+ | % sub-select some rows and some columns | ||

+ | randompart = r10x10(randn(1, 10) > 0, randn(1, 10) > 0)</code> |

matlab_-_datatypes.txt · Last modified: 2012/10/06 17:58 by jochen