User Tools

Site Tools


matlab_-_datatypes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
matlab_-_datatypes [2012/10/05 18:37] jochenmatlab_-_datatypes [2012/10/06 04:08] – re-written jochen
Line 1: Line 1:
-====== Matlab - datatypes ====== +====== Matlab - variables, datatypes, and indexing ====== 
-This page covers the basic (built-in) datatypes available in Matlab, together with some notes on how they can be used, how they can be converted, and how they relate to one another.+This page covers the use of variables in Matlab, the basic (built-in) datatypes, together with some information on how they can be used, how they can be converted, and how they relate to one another. The last part covers the different indexing syntax mechanisms and some pitfalls.
  
-===== General notes ===== +===== Variables ===== 
-Variables in Matlab have the following general properties: +A variable can be thought of as a storage container that has the following properties: 
-  * variables are **accessible via an identifier** that must begin with lettermay contain numbers and underscores, but no symbols or blanks:<code matlab>some valid variable names with assignments:+  * links a name (identifier) to a value (or list of values) 
 +  * is available in a workspace, i.e. when a function is called, variables available in the calling workspace are hidden (different function code files can use the same variable names without conflict), unless they are **''global''** variables 
 +  * can be assigned a (new) value (or list of values) using the **''=''** sign (assignment operator) 
 +  * has a specific datatype, which can change during the course of a program (or command line session) 
 +    * changing datatypes is considered bad coding practice and should be avoided:<code matlab>% define x as a number 
 +x = 1; 
 + 
 +% re-define x as a string: no error! 
 +x = 'string';</code> 
 +  * is of arbitrary size (which means it can also be empty), which can also change (older versions would allow up to 63 dimensions, but this limit no longer exists) 
 +    * that means that a variable containing a single number is of the same type as a variable containing several numbers (of the same type) 
 +  * does not require declaration (like in C/C++) but can be created at any time (on the command line or at any point in a program, i.e. M-file) 
 +    * this unfortunately makes it sometimes difficult to find out what an identifier stands for (function or variable? what datatype and content?) 
 +  stores this value until it is reassigned a new value or the variable is "''clear''-ed" from the workspace 
 +  allows indexing operations to access sub-parts of a list of values (both reading from and writing into parts of variables) 
 +    * to read-access a sub-portion of an array, the index expression has to be provided within parentheses on the right-hand side (e.g. **''part = fullarray(portion);''**) 
 +    * on the left-hand side, a part of an array can be replaced with new data (e.g. **''fullarray(portion) = newvalues;''**) 
 +    * in that case, **''newvalues''** must either be single number (all indices addressed by **''portion''** will be set to the same number) or must match in size 
 +    * if the variable is smaller than indicated by the index expressionMatlab will attempt to grow the variable accordingly:<code matlab>% define a 2x3 array 
 +a = [10, 20, 30; 40, 50, 60]; 
 + 
 +% assign the value 100 to the second through 4th row and the 3rd through 4th column 
 +a(2:4, 3:4) = 100; 
 + 
 +% a is now a 4x4 array!</code> 
 +  * can be used in expressions (e.g. in computations, function calls, to index another variable, or to form new, compound variables) 
 +  * is available for storing the data contained therein to a file on disk and can be loaded from disk as well 
 + 
 +Please note that at the end of a function (including when the keyword **''return''** is reached while executing a function) all variables that are not "returned" or marked as **''persistent''** are cleared from the workspace and memory. 
 + 
 +==== Variable identifiers ==== 
 +There are a few rules applying to identifiers: 
 +  * a valid identifier must contain only letters (lower and upper case), numbersand underscores, but no symbols or blanks 
 +  * it must begin with a letter 
 +    * some valid variable names with assignments:<code matlab>
 v = 1; v = 1;
 VAR12 = 12; VAR12 = 12;
 A_Really_Long_Name = 'long name';</code> A_Really_Long_Name = 'long name';</code>
-  * please note that if a variable is given the same name as an existing function, the identifier then only refers to the variable (see also [[Matlab - precedence|precedence rules]]):<code matlab>% defining a new array+  * **keywords can not be used as identifier names**, which excludes the following words from being identifier/variable names: **''break, case, catch, classdef, continue, else, elseif, end, for, function, global, if, otherwise, parfor, persistent, return, spmd, switch, try, while''** 
 +  * if a variable is given the same name as an existing function, the identifier then only refers to the variable in this workspace (see also [[Matlab - precedence|precedence rules]]):<code matlab>% defining a new array
 newarray = [1, 2, 3, 4]; newarray = [1, 2, 3, 4];
  
Line 19: Line 54:
 % then this leads to an error... % then this leads to an error...
 notthesum = sum(newarray); % index exceeds dimensions!!</code> notthesum = sum(newarray); % index exceeds dimensions!!</code>
-  * variables can be defined (created) at any time and do not require a declaration (such as in C/C++); this unfortunately makes it sometimes difficult to find out what an identifier stands for (function or variable? what datatype and content?) 
-  * variables can change type and size at any time in the code (although this is bad coding practice and should be avoided):<code matlab>% define x as a number 
-x = 1; 
  
-% re-define x as a string: no error! +===== Datatypes ===== 
-'string';</code> +Datatypes can begenerally, divided into 5 major groups: 
-  * access (availability) of variables is organized in workspaces (different function code files can use the same variable names without conflict) +  * numeric datatypes (incl. logical datatype for **''true''**/**''false''** values
-  * each variable type supports multidimensional size (older versions would allow up to 63 dimensionsbut this limit no longer exists) +  * text (character/stringdatatype 
-  * that means that a variable containing a single number is of the same type as a variable containing several numbers (of the same type) +  * compound datatypes (to store values of different types in one variable) 
-  * to access sub-portion of an array, an index expression has to be provided within parentheses (e.g. **''part = fullarray(portion);''**+  * function handles 
-  * this sub-portion access also works when only a part of an array is to be replaced with new data (e.g. **''fullarray(portion) = newvalues;''**) +  * user-defined datatypes/objects
-  * in that case, **''newvalues''** must either be a single number (all indices addressed by **''portion''** will be set to the same numberor must match in size +
-  * if the variable is smaller than indicated by the index expression, Matlab will attempt to grow the variable accordingly:<code matlab>% define a 2x3 array +
-a = [10, 20, 30; 40, 50, 60];+
  
-% assign the value 100 to the second through 4th row and the 3rd through 4th column +==== Numeric datatypes ====
-a(2:4, 3:4) 100; +
- +
-% a is now a 4x4 array!</code> +
- +
-===== Numerical datatypes =====+
 By default, a variable in Matlab that is storing a numeric value (or a list/array of numbers) has the datatype "double". So, in a simple assignment of a number (or array) to a variable (such as **''<nowiki>a = 1;</nowiki>''** or **''<nowiki>b = [2, 3, 4];</nowiki>''**), the datatype would be double, regardless of whether the number is integer or not! While this requires more memory (for large arrays of numbers), at least the user (or code writer) doesn't have to worry about datatype conversions, etc. Unless you have very specific needs (e.g. lower memory usage or increased speed for specific operations), it is recommended to use the default datatype. By default, a variable in Matlab that is storing a numeric value (or a list/array of numbers) has the datatype "double". So, in a simple assignment of a number (or array) to a variable (such as **''<nowiki>a = 1;</nowiki>''** or **''<nowiki>b = [2, 3, 4];</nowiki>''**), the datatype would be double, regardless of whether the number is integer or not! While this requires more memory (for large arrays of numbers), at least the user (or code writer) doesn't have to worry about datatype conversions, etc. Unless you have very specific needs (e.g. lower memory usage or increased speed for specific operations), it is recommended to use the default datatype.
  
Line 58: Line 82:
 A special case is the **''logical''** datatype. It can only store two values: **''false''** or ''**true**''. If converted to any of the other numeric datatypes, **''false''** is converted to **''0''** and **''true''** is converted to **''1''**. A special case is the **''logical''** datatype. It can only store two values: **''false''** or ''**true**''. If converted to any of the other numeric datatypes, **''false''** is converted to **''0''** and **''true''** is converted to **''1''**.
  
-===== Character datatype =====+Please note that instead of being keywords (such as in C/C++), datatypes are not "reserved" words, but obviously the names of datatypes should not be used as variable identifiers. The reason is that those names are also the names of the functions used to convert the numeric datatypes into one another! For instance the code snippet <code matlab>dvar [7, -10, 13]; ivar = int32(dvar);</code> converts the **''double''** variable dvar (default numeric datatype!) into a variable of type **''int32''**. If the target datatype cannot hold the value(s), for instance because the value range is too small, the value will be truncated (in precision and range) to fit the new datatype. 
 + 
 +==== Character datatype ====
 Given that Matlab variables can be arrays of arbitrary size, single characters, as well as "strings" (a series of characters, such as in a word or sentence) and also lists of strings (two-dimensional field of characters) all are stored with the same basic datatype: **''char''**. Given that Matlab variables can be arrays of arbitrary size, single characters, as well as "strings" (a series of characters, such as in a word or sentence) and also lists of strings (two-dimensional field of characters) all are stored with the same basic datatype: **''char''**.
  
Line 74: Line 100:
 % is their distance in the alphabet, in this case 3!</code> % is their distance in the alphabet, in this case 3!</code>
  
-===== Cell compound data ===== +And that also means that a variable can be converted from **''char''** to **''double''** (or any other compatible numeric datatype) and back. This can be useful to store large quantities of text and also to perform arithmetic operations on characters (for instance to test if all elements of a string are letters).
-In many situations it is necessary to store data of different types (e.g. name/string together with a number, such as age) in a "dataset", which still should be accessible via a single variable. For this purpose, Matlab provides the **''cell''** datatype.+
  
-To define a cell array as well as to address the **content** of a cell, Matlab uses the "curly braces" characters: **''<nowiki>{</nowiki>''** and **''<nowiki>}</nowiki>''**:<code matlab>% define a 1x2 cell array with a name and an age+==== Compound datatypes ==== 
 +In many situations it is necessary to store data of different types (e.g. a name/string together with a number, such as age) in a "dataset", which still should be accessible via a single variable. For this purpose, Matlab provides built-in compound datatypes. 
 + 
 +These compound datatypes are further sub-divided into two types: one where elements are (mainly) addressed by a numeric (or equivalent) indices, and one where elements are accessed by a field name in a tree-like structure (same rules as identifiers, but also keywords can be used as field names, given the syntax). 
 + 
 +=== Cell datatype === 
 +The **''cell''** datatype allows to access individual elements (as well as groups of elements), called "cells", with numeric indexing. This is mostly useful for storing tabular data (or data where several "columns" should be accessible at once), and numeric only data is too limited (e.g. in cases where text elements cannot be reasonably matched to numeric values a priori). 
 + 
 +To define a cell array as well as to address the **content** of a cell, Matlab uses the "curly braces" symbols: **''<nowiki>{</nowiki>''** and **''<nowiki>}</nowiki>''**:<code matlab>% define a 1x2 cell array with a name and an age
 name_and_age = {'John Doe', 41}; name_and_age = {'John Doe', 41};
  
Line 89: Line 122:
  
 Put differently, if you imagine a shelf with 5 jars on it. The entire shelf then represents a cell array (by the name of **''shelf''**). The expression **''shelf(3)''** will then return the third jar of the shelf. On the other hand, the expression **''shelf{3}''** returns the **content** of the third jar! Put differently, if you imagine a shelf with 5 jars on it. The entire shelf then represents a cell array (by the name of **''shelf''**). The expression **''shelf(3)''** will then return the third jar of the shelf. On the other hand, the expression **''shelf{3}''** returns the **content** of the third jar!
 +
 +Also, please note that if you use a variable (numeric, char, or compound!) when creating a new compound variable or setting one or several elements of a compound variable, the values in the compound variable will be **copies** of the content of the original variable (or rather, if the variable is altered, a copy is created). For instance:<code matlab>% assign a value to variables n and s
 +n = [12, 14, 10];
 +s = 'size';
 +
 +% store variables in compound variable
 +c = {s, n};
 +
 +% re-assign index 2 of n variable
 +n(2) = 18;
 +
 +% and then c still contains [12, 14, 10] in its second cell element!</code>
 +
 +=== Struct datatype ===
 +The other compound datatype in Matlab is the **''struct''** type. This allows to store multiple values of different types accessible via names.
 +
 +Syntax-wise, the variable name (of the compound variable) is followed by a period (dot character, **''.''**) and a field name. If the field name itself is stored in a (char) variable, the syntax is **''struct_variable.(field_name)''**.
 +
 +The obvious advantage is that code usually becomes less cryptic, given that instead of using syntax such as **''compound_var{3} = some_function(some_value);''** you would write **''compound_var.property = some_function(some_value);''** whereas ''property'' can be a more meaningful term (such as ''name'' or ''duration'') instead of using a numeric index (as with the cell syntax).
 +
 +Please note that variables of type struct still can be of arbitrary size! This means that a list of structures (e.g. 5 people's names and their ages) could be stored in one variable. The syntax to access the 3rd person's name and age would then be:<code matlab>% read out one persons name and age
 +this_name = people(3).name;
 +this_age = people(3).age;</code>
 +
 +In turn this means that if the index expression is omitted, the **''struct_var.fieldname''** syntax produces a list of values (possibly with different datatypes). The only useful way to capture such a list is by creating an ad-hoc cell array:<code matlab>% get the names of all people
 +all_names = {people.name};</code>
 +
 +=== Multi-layered compounding ===
 +Each cell and struct field can contain any of Matlab's supported datatypes, including cell and struct arrays! This means that elaborate structures (or cells) can be created:<code matlab>% assign a part of a subfield's value
 +main_tree.leaf(3).subfield{4}(11:20) = 1;</code>
 +
 +If you see such a piece of code this can be translated into
 +  * main_tree is of datatype struct (and must be of size 1x1, i.e. a scalar struct, to be valid)
 +  * one of the field names of main_tree is called leaf, is also of datatype struct with at least field name subfield, and it presumably has at least 3 elements (see comment below)
 +  * the subfield (of the third leaf!) is of type cell with at least 4 elements
 +  * the 4th element of the subfield is a numeric array
 +  * numeric indices 11 through 20 (of this numeric array) are set to 1
 +
 +Please be aware that this line of code is also valid if main_tree is not yet defined! In that case, the **''leaf''** field will be initialized as a 1x3 struct with field name subfield, but the first two "leaf" elements will have an empty subfield. Only the third leaf's subfield will be assigned a value, which will be a 1x4 cell array, of which only the 4th element will have a value. And that will be a 1x20 double array, of which the first 10 values are 0!
 +
 +As you can see, while Matlab's "auto-declaration" of variables can be quite useful, it sometimes makes code difficult to understand (given that no explicit declaration is ever given that pre-determines the shape or type of content for compound variables).
 +
 +==== Function handles ====
 +The **''function_handle''** type is an advanced datatype which is used mainly in three cases:
 +  * the type of operation that is to be applied to a variable (or expression) is not fully determinable before code is run, in which case a function handle can be used to call a variable function (instead of using a syntax construction that selects from all possible options, which still could be insufficient if a function can be user defined)
 +  * an argument in a function call itself is "an operation" to be applied to certain values
 +  * a function is only available in a certain context (private function or sub-function in an M-file), in which case a function handle can be created and returned, allowing functions outside the original scope to use this function after all
 +
 +Given the fact that this feature is fairly advanced, I won't be giving any in-depth examples at this point. Just be aware that variables can also be of type **''function_handle''**.
 +
 +==== User-defined datatypes ====
 +Matlab allows users to add functionality by creating text files with the **''.m''** extension. While this allows to create new functions that can be applied to values, most modern languages also have object-oriented design patterns. Among those are method overloading, inheritance, and protected storage of properties.
 +
 +For this purpose, Matlab allows to define new datatypes (classes) by adding folders with a leading **''@''** (at) symbol, and placing an M-file into this folder with the same name (without the **''@''** sign).
 +
 +Again, this is fairly advanced coding and will not be discussed in-depth at this point. There are, however, a few important aspects I want to mention:
 +  * the internal representation of objects is supposed to be of type struct
 +  * indexing operations (incl. the struct syntax of **''variable.fieldname''**) can be overloaded on objects
 +  * NeuroElf makes use of this feature by allowing a more C++/Visual Basic style syntax:<code matlab>% NeuroElf xff and xfigure examples
 +% load a VMR into an xff object
 +vmr = xff('some.vmr');
 +
 +% call the function in @xff/private/aft_Browse.m
 +vmr.Browse;
 +
 +% get xfigure object of the main UI figure
 +neuroelf_gui;
 +global ne_gcfg;
 +mainfig = ne_gcfg.h.MainFig;
 +
 +% switch to "page" 2 (this is overloaded differently in the @xfigure/subsref.m file)
 +mainfig.ShowPage(2);</code>
 +
matlab_-_datatypes.txt · Last modified: 2012/10/06 15:58 by jochen